[PDF] CoPaSul Manual -- Contour-based parametric and superpositional intonation stylization

Abstract

The purposes of the CoPaSul toolkit are (1) automatic prosodic annotation and (2) prosodic feature extraction from syllable to utterance level. CoPaSul stands for contour-based, parametric, superpositional intonation stylization. In this framework intonation is represented as a superposition of global and local contours that are described parametrically in terms of polynomial coefficients. On the global level (usually associated but not necessarily restricted to intonation phrases) the stylization serves to represent register in terms of time-varying F0 level and range. On the local level (e.g. accent groups), local contour shapes are described. From this parameterization several features related to prosodic boundaries and prominence can be derived. Furthermore, by coefficient clustering prosodic contour classes can be obtained in a bottom-up way. Next to the stylization-based feature extraction also standard F0 and energy measures (e.g. mean and variance) as well as rhythmic aspects can be calculated. At the current state automatic annotation comprises: segmentation into interpausal chunks, syllable nucleus extraction, and unsupervised localization of prosodic phrase boundaries and prominent syllables. F0 and partly also energy feature sets can be derived for: standard measurements (as median and IQR), register in terms of F0 level and range, prosodic boundaries, local contour shapes, bottom-up derived contour classes, Gestalt of accent groups in terms of their deviation from higher level prosodic units, as well as for rhythmic aspects quantifying the relation between F0 and energy contours and prosodic event rates.

Full PDF

CCoPaSul ManualContour-based, parametric, and superpositional intonation stylization

Uwe D. ReichelResearch Institute for LinguisticsHungarian Academy of [email protected] 0.8.x, October 28th, 2018 a r X i v : . [ c s . C L ] O c t ontents Voice quality 2410 Feature sets 2511 Conﬁgurations 28

12 Output 53

13 Plotting 8814 Known bugs 8915 History 89References 94 Introduction

The purposes of the CoPaSul toolkit are (1) automatic prosodic annotation and (2) prosodic feature extraction fromsyllable to utterance level.CoPaSul stands for contour-based, parametric, superpositional intonation stylization. The core model is introducedamongst others in [14]. In this framework intonation is represented as a superposition of global and local contoursthat are described parametrically in terms of polynomial coeﬃcients. On the global level (usually associated but notnecessarily restricted to intonation phrases) the stylization serves to represent register in terms of time-varying f0 leveland range. On the local level (e.g. accent groups), local contour shapes are described. From this parameterizationseveral features related to prosodic boundaries and prominence can be derived. Furthermore, by coeﬃcient clusteringprosodic contour classes can be derived in a bottom-up way. Next to the stylization-based feature extraction alsostandard f0 and energy measures (e.g. mean and variance) as well as rhythmic aspects can be calculated.At the current state automatic annotation comprises: • segmentation into interpausal chunks • syllable nucleus extraction • unsupervised localization of prosodic phrase boundaries and prominent syllablesF0 and partly also energy feature sets can be extracted for: • standard measurements (as median and IQR) • register in terms of f0 level and range • prosodic boundaries • local contour shapes • bottom-up derived contour classes • Gestalt of accent groups in terms of their deviation from higher level prosodic units • rhythmic aspects quantifying the relation between f0 and energy contours and prosodic event ratesPlease see section 10 for a list of application examples. The CoPaSul command-line toolkit can be downloaded from this location: http://clara.nytud.hu/~reichelu/copasul.zip

The toolkit is written in Python 3 and depends on the following Python packages (with the speciﬁed version orhigher): • matplotlib 1.3.1 • numpy 1.8.2 • pandas 0.13.1 • scipy 0.15.1 • scikit learn 0.17.1So far the software is tested only for Linux! The installation steps are:

1. unzip the copasul.zip in your target folder DIR2. change to DIR3. open make.py in a text editor and set the string variable python path to the Python3 call related to your platform https://docs.python.org/3/using/mac.htmlhttps://docs.python.org/3/using/unix.htmlhttps://docs.python.org/3/using/windows.html

4. and call > python3 make.py this script adjusts the python path according to your changes in make.py , and inserts DIR into the pythonscripts, so that all copasul modules are found.5. for a command line test call change to DIR and type > copasul.py -c conﬁg/test.json

6. the result should be found in the test/res/ subfolder and should contain: • csv ﬁles with analysis results and corresponding R input code template ﬁles to read the csv ﬁles for furtherstatistical analyses • • The main script copasul.py can be used in a shell or within the Python3 environment. After having changed to thecopasul directory from the shell it is called as follows: > copasul.py -c myConﬁgFile.json , e.g. > copasul.py -c conﬁg/test.json The content of myConﬁgFile.json is explained in section 11.Within the Python environment the tool is used this way: >>> import copasul as copa >>> myCopa = copa.copasul( { ’conﬁg’:myConﬁg } ) >>> myCopa = copa.copasul( { ’conﬁg’:myConﬁg, ’copa’:myCopa } ) The input argument is a nested dictionary with at least one sub-dictionary conﬁg , which contains the conﬁgurations(see section 11). copasul() returns the output dictionary myCopa with the extracted feature sets (see section 12.3).In case feature extraction should not start from scratch, but an already existing dictionary should be corrected orexpanded, it will be passed to the function via the key copa as shown in second example.For shell calls as well as for calls within the Python environment the stylization output is written to a Pythonpickle ﬁle and to csv table ﬁles as speciﬁed in the conﬁgurations. See section 12.5

Input

For automatic annotation Copasul needs audio and f0 table ﬁles. For feature extraction it additionally needs annotationﬁles. For the voice feature set furthermore pulse table ﬁles are needed. Corresponding ﬁles do not necessarily need tohave the same name stem, but it is assumed that all audio, f0, and annotation ﬁles are sorted the same. An examplecan be found in the input subdirectory.Additionally a conﬁguration ﬁle in

JSON format is needed as further speciﬁed in section 11.

Currently only wav ﬁles are supported. The ﬁles can be mono or stereo. For conversion to wav, e.g.

Praat, Audacity ,or

Sox software can be used.

Plain text ﬁles. Tables with whitespace column separator. The ﬁrst column contains time information. All furthercolumns contain the f0 of the respective channel. For mono ﬁles f0 tables thus consist of 2 columns, for stereo ﬁlesof 3, etc. All columns need to have the same lengths. Undeﬁned f0 values are to be replaced by 0. Only 100 Hzsample rate is supported, and resampling is carried out from other rates. The Praat scripts extract f0.praat and extract f0 stereo.praat which are contained in this package provide the required input format.

Plain text ﬁles. Only needed for the voice feature extraction. Tables with whitespace column separator. Each columncontains the pulse time stamps for one channel in seconds. All columns must contain the same number of rows so thatfor ﬁles with more than one channel -1 has to be padded to the shorter columns. The Praat scripts extract pulse.praat and extract pulse stereo.praat which are contained in this package provide the required input format.

The Praat TextGrid format (long and short) and an XML format of the following form are supported. ...mySegmentTierx0.30.9...myEventTiery0.7.........

The tiers need to be stored in the tiers subtree right below the root element.Each tier must have a name assigned by the element name . The items of each tier are collected in the items subtree, in which each item is stored in an item subtree.Segment tiers (see next section) must contain the elements label, t start, t end . Event tiers must contain theelements label, t . 6he XML annotation ﬁle can be extended by the user as long as it fulﬁlls the speciﬁed requirements in the tiers subtree.

In the following the notation a:b:... refers to branches through the conﬁguration dictionary which is introduced insection 11. The annotation ﬁles can contain tiers of the following types:

Segment tiers contain items deﬁned by a label, a start point and an endpoint. They correspond to Praat Inter-valTiers.

Event tiers contain items without a temporal extension. They are deﬁned by a label and a time stamp andcorrespond to Praat TextTiers.Both segment tiers and event tiers are supported for most of the analyses. Wherever needed, an event is convertedto a segment by centering a window of length preproc:point win on the event as is explained in more detail in section8.3. Pause information can only be extracted for segment tiers. In TextGrids pauses are considered to be items withempty labels or labeled as fsys:label:pau . Both event and segment tiers can serve as:

Analysis tiers

In the context of automatic annotation these tiers contain or limit the candidate locations for prosodicevents. Can be segment or event tiers. fsys:augment:glob:tierfsys:augment:loc:tier accfsys:augment:loc:tier ag

For feature extraction these segment or event tiers deﬁne the units of analysis. fsys:chunk:tierfsys:glob:tierfsys:loc:tier accfsys:loc:tier agfsys:bnd:tierfsys:gnl f0:tierfsys:gnl en:tierfsys:rhy f0:tierfsys:rhy en:tier

Parent tiers

Parent tiers (1) limit the analysis and normalization windows by their segment boundaries. As anexample, normalization across chunk boundaries can be suppressed. (2) They limit the domain of global trends againstwhich local deviation is measured. It’s strongly recommended to use segment tiers for this purpose. If not speciﬁed,the whole ﬁle is treated as a single parenting segment. For automatic annotation parent tiers are to be deﬁned by: fsys:augment:syl:tier parentfsys:augment:glob:tier parentfsys:augment:loc:tier parent

For glob, bnd, gnl en, gnl f0, rhy en, rhy f0 feature extraction (see section 10) only speech chunks can serve asparent domains: fsys:chunk:tier

Fallback is again the entire ﬁle. For loc feature extraction only the segments of the glob analysis tier can form theparent domain due to the:

Superpositional framework

Within the CoPaSul approach (see section 8.4) the intonation contour is consideredas a superposition of a global and local components. Their domains are deﬁned by the glob and loc option branches,respectively: fsys:glob:tierfsys:loc:tier accfsys:loc:tier ag

This has two implications on the annotation tier deﬁnitions: • for each channel only one tier is supported each for the global and the local local domain • the global domain tier is treated as the parent tier for the local domain tier7 utput tiers For automatic annotation these tiers are deﬁned by a stem which is always expanded by the recordingchannel index. fsys:augment:chunk:tier out stmfsys:augment:syl:tier out stmfsys:augment:glob:tier out stmfsys:augment:loc:tier out stm

As an example, given a stereo ﬁle and the chunk output tier name CHUNK, the tiers CHUNK 1 and CHUNK 2will be added to the annotation ﬁle. For the sake of an uniform treatment, also for mono ﬁles the channel index willbe added.

Tier speciﬁcation

For all tiers, that were not automatically generated, the user needs to specify the recordingchannel index it refers to (also for mono ﬁles!), e.g.: fsys:channel:’tierA’=1fsys:channel:’tierB’=2 tierA thus refers to channel 1, and tierB to channel 2. Tier names can be speciﬁed as strings, or as list of strings. fsys:bnd:tier=’tierA’ means, that the bnd feature extraction is to be carried out for units deﬁned by the content of tierA . fsys:bnd:tier= [ ’tierA’,’tierB’ ]triggers a bnd feature extraction for the content of two tiers. The channels the speciﬁed tiers refer to are lookedup in fsys:channel:* .The name stem of a tier resulting from automatic annotation (e.g. CHUNK) will be expanded automatically, thusfor a chunked stereo ﬁle these two speciﬁcations are equivalent: fsys:bnd:tier=’CHUNK’fsys:bnd:tier= [ ’CHUNK 1’, ’CHUNK 2’ ]For the feature sets bnd, gnl en, gnl f0, rhy en, rhy f0 (see section 10) an arbitrary number of tiers can be speciﬁedfor each channel. For chunk, glob, loc only one tier per channel is supported. For f0 extraction in mono or stereo wav ﬁles the two Praat scripts contained in this package can be used.They can be called this way: > praat extract_f0.praat myStepsize myMinFreq myMaxFreq \myAudioInputDir myF0OutputDir myAudioExt myF0Ext

The usage of extract f0 stereo.praat is the same. Note that subsequent stylization in any case initiates aresampling to 100 Hz , so that myStepsize here can be directly set to 0.01. myMinFreq and myMaxFreq refer to theminimum and maximum of allowed f0 values in Hz. Values below or above are considered as measurement errors andare set to 0. The f0 range choice depends on the recorded speakers. As a rule of thumb the parameters can be setto 50 and 400 Hz, respectively. In my myAudioInputDir the sound ﬁles with the extension myAudioExt are collected,and corresponding f0 plain text table ﬁles with the audio ﬁle’s name stem and the extension myF0Ext are outputtedto the directory myF0OutputDir . Pulse extraction is needed for the voice feature set only. For its extraction in mono or stereo wav ﬁles the two Praatscripts contained in this package can be used.They can be called this way: > praat extract_pulse.praat myMinFreq myMaxFreq \myAudioInputDir myPulseOutputDir myAudioExt myPulseExt

The usage of extract pulse stereo.praat is the same. The scripts make use of Praat’s

To PointProcess (cc) routineoperating on sound and pitch objects. For pitch object creation the minimum and maximum of allowed f0 values myMinFreq and myMaxFreq need to be speciﬁed in Hz. In my myAudioInputDir the sound ﬁles with the extension myAudioExt are collected, and corresponding pulse plain text table ﬁles with the audio ﬁle’s name stem and theextension myPulseExt are outputted to the directory myPulseOutputDir .8 Automatic annotation

Automatic unsupervised prosodic annotation comprises chunking, syllable nucleus and boundary extraction, prosodicphrase extraction, and pitch accent localization. Details of the algorithms will be given in [16]. At the beginning ofeach introductory paragraph it is speciﬁed: navigation: which navigation option to set to True in the conﬁguration ﬁle (see section 11) feature sets: which feature sets result from the annotation (see section 10) option sub-dictionary: which conﬁguration sub-dictionaries serve to customize the respective processing (see section 11) output sub-dictionary: which subdirectory of the resulting python nested dictionary contains the extracted feature set (seesection 12.3).

Paths through the conﬁguration dictionary are referred to by my:path:to:option . navigation: do augment * feature sets: – option sub-dictionary: fsys:augment:*:*; augment:*:* output sub-dictionary: (augmented annotation ﬁle) navigation: do augment chunk feature sets: – option sub-dictionary: fsys:augment:chunk:*; augment:chunk:* output sub-dictionary: (augmented annotation ﬁle) Chunking serves to segment the utterance into interpausal units. It is based on a pause detector, that works thefollowing way: an analysis window w a with length augment:chunk:l is moved over the lowpass-ﬁltered signal togetherwith a longer reference window w r of length augment:chunk:l ref with the same midpoint. A pause is set where themean energy in w a is below a threshold deﬁned relative to the energy in w r , i.e. if e ( w a ) < e ( w r ) · augment:chunk:e rel .Chunks are then trivially assigned to interpausal intervals. Silence margins can be set at chunk starts and ends by augment:chunk:margin .If w r itself is identiﬁed as a pause by e ( w r ) < e ( s ) · augment:chunk:e rel it is replaced by s ; where s consists ofselected parts of the acoustic signal in the analysed channel with absolute amplitude values above the median. Bythis lower threshold the robustness against a high occurrence of speech pauses is increased.The ﬁltering of the signal can be customized by the sub-dictionary augment:chunk:flt . In there btype gives theButterworth ﬁlter type ( high, low, band, or none ), f the cutoﬀ frequencie(s), and ord the order. For pauses as well asfor inter-pause intervals minimum lengths can be deﬁned by augment:min pau l and min chunk l , respectively. Pausesare then merged across too short chunks, and chunks are merged across too short pauses. The segment tier outputwill be added to the annotation ﬁle. The tier name is speciﬁed by fsys:augment:chunk:tier out stm concatenatedwith the respective channel index. Standard labels ’x’ are assigned to chunk segments, and fsys:label:pau to thepauses inbetween. navigation: do augment syl feature sets: – option sub-dictionary: fsys:augment:syl:*; augment:syl:* output sub-dictionary: (augmented annotation ﬁle) For syllable nucleus detection the method proposed by [13] is adopted. Again an analysis w a with length augment:syl:l and a longer reference window w r of length with length augment:syl:l ref with the same mid-point are moved along the signal, which this time is band-pass ﬁltered to focus on the frequency band related tovocalic nuclei. The ﬁlter speciﬁcation in augment:syl:flt works as described for chunking. From this energy contourthe local maxima are extracted. If for a local maximum the mean energy in w a supersedes the mean energy in w r bya deﬁned factor, i.e. if e ( w a ) > e ( w r ) · augment:syl:e rel , and if e ( w a ) is not below a deﬁned fraction of the energyin the current chunk w c (fallback: whole ﬁle), i.e. e ( w a ) ≥ e ( w c ) · augment:syl:e min , a syllable nucleus is set. Fromwhich tier to get the current chunk is to be deﬁned by augment:syl:tier parent . E.g. it can be the output tier of apreceding chunking step. A further constraint augment:syl:d min speciﬁes the minimum distance between subsequentsyllable nuclei. If two nuclei are too close, they are merged to a single syllable and the point of energy maximum inthis interval is assigned to be the nucleus.Subsequently syllable boundaries are assigned to the energy minimum between adjacent syllable nuclei. They justserve as fallback prosodic boundary candidates.The output consists of two event tiers for syllable nuclei and boundaries and will be added to the annotation ﬁle.The tier name is speciﬁed by fsys:augment:syl:tier out stm . For the nuclei it is concatenated with the respectivechannel index. For the boundaries it is concatenated with a ’bnd’ inﬁx and the channel index. Standard labels ’x’ areassigned for both tiers. 9 .3 Prosodic phrase boundary location navigation: do augment glob feature sets: – option sub-dictionary: fsys:augment:glob:*; augment:glob:* output sub-dictionary: (augmented annotation ﬁle) Prosodic phrase boundary decisions are based on nearest centroid classiﬁcation. The user needs to specify the tierthat contains boundary candidates in fsys:augment:glob:tier . For segment tiers these candidates are the segmentboundaries, for event tiers, the candidates are the time stamps. If no tier is speciﬁed, syllable boundaries derived bystep 7.2 will be selected as candidates. At each boundary candidate a feature set is extracted that had been provento be related to prosodic boundaries in former studies [20, 21]. This feature set is introduced in section 8.9. The userneeds to specify which of these features should be selected by augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+ .In case a phone segment tier is available and if centroids are derived from the entire data set and not separatelyfor each ﬁle (see below), in addition z-scored vowel length can be used as a feature. The length of the vowel associatedwith the prosodic event candidate is divided by its mean length derived from the entire dataset. The associated vowelis the last vowel segment with an onset before the boundary candidate time stamp. The length feature can be addedby: augment:glob:wgt:pho=1

The phonetic segment tiers (one for each channel) are to be speciﬁed in fsys:pho:tier

Vowels are identiﬁed in these tiers by a regular expression stored in fsys:pho:vow

This feature will be beneﬁcial for languages in which phrase boundaries and/or accents are marked by phonesegment lengthening.Furthermore the user can select whether the current feature values at time i , v i , or the delta values (i.e. thediﬀerences to the preceding values v i − v i − ) or both should be taken: augment:glob:measure Some features require units from a parent tier which is to be speciﬁed by augment:glob:tier parent , e.g. tomeasure local f0 trend discontinuities within a superordinate unit and to limit analysis and normalization windows.Such units are e.g. chunks derived from preceding chunking. Fallback is the entire ﬁle.From the features for each of the two classes boundary B and no boundary NB a centroid can be bootstrapped inseveral ways given the speciﬁcation in augment:glob:cntr mtd as described in the following sections. Centroids canbe calculated separately for each ﬁle or over the entire data set by setting the value of augment:glob:unit to ﬁle or batch , respectively. The latter is strongly recommended for corpora containing lots of short recordings. augment:glob:cntr mtd=splitaugment:glob:prct=mySplitPoint Since for all extracted pause length and pitch discontinuity boundary features are positive correlation has beenfound to perceived boundary strength [20, 21] B and NB centroids can be straight-forwardly derived from high andlow feature values, respectively. Centroids are thus derived by splitting each column in the feature matrix at thepercentile augment:glob:prct . The B centroid is deﬁned by the median of the values above the splitpoint, the NBcentroid by the median of the values below. All feature vectors are then assigned to the nearest centroid in a singlepass. Boundaries are subsequently inserted at all candidate time points classiﬁed as B. This method works for bothsegment and event tier input. augment:glob:cntr mtd=seed kmeansaugment:glob:min l=myMinPhraseLength

This procedure works for segment tier input only since it makes use of pauses between adjacent segments. Asvisualized in Figure 1 B and NB centroids are bootstrapped based on two assumptions: (1) each pause indicates aprosodic boundary, and (2) prosodic phrases have a minimum length, thus in the vicinity of pauses there are no furtherboundaries. KMeans clustering is then initialized by these two centroids and subdivides all candidates into the B andNB cluster. Boundaries are inserted at all candidate time points belonging to the B cluster.10igure 1: Bootstrapping seed centroids for the classes 1 (boundary) and 0 (no boundary). Word boundaries areindicated by the short vertical lines. Assumptions: each pause indicates a prosodic boundary (green), and prosodicphrases have a minimum length (red window), thus in the vicinity of pauses there are no further boundaries (blue). augment:glob:cntr mtd=seed prctaugment:glob:prct=mySplitPointaugment:glob:min l=myMinPhraseLength

The seed centroid bootstrapping works as for the preceding method. Instead of kMeans, for the remaining featurevectors the Euclidean distance to the NB seed centroid is calculated. Vectors with a distance above the mySplitPoint -thpercentile of all measured distances are assigned to the B class, the others to the NB class.

The percentile split method works for both segment and event tiers, whereas the two centroid bootstrapping methodsneed segment tier input to infer pause locations. For the two percentile split approaches, the parameter augment:glob:prct serves to control for the number of inserted boundaries. The higher, the smaller the B class, thus the fewer boundarieswill be assigned.If a text transcription is at hand the user can ensure that prosodic boundaries only occur at word boundaries bypreceding signal-text alignment, e.g. by WebMAUS [25, 9].

Heuristics augment:glob:heuristics=ORT

If set by the user, this heuristics assumes a word segment tier as input and rejects boundaries after too short andthus probably function words ( < . s ) augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+=myWeightaugment:glob:wgt mtd=myWeightingMethod By the augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+ branches the user at the same time selects andweights features. As an example augment:glob:wgt:win:ml:rms=1 selects the feature rms derived from the register representation ml within the boundary feature set win (see sections8.9 and 10 for explanations). If the weighting method in augment:glob:wgt mtd is set to ’user’, the weight of thisfeature becomes 1. If no weighting is intended, to all selected features should be assigned the same weight. As analternative to the deﬁnition by the user, weights can also be extracted by correlation to the median or by the clustersilhouette measure. Correlation

Each feature is correlated with the medians of the feature vectors. Since as mentioned all boundaryfeatures are expected to be positively correlated to boundary strength, and since the median is expected to be morerobustly related to boundary strength than single features, the correlation between a feature and the medians to someextend reﬂects the goodness of this feature to predict boundary strength. Features with a negative correlation to themedian will be removed from the pool. All remaining correlations are transformed to weights summing up to 1 bydividing them by the sum of correlations.

Silhouette

The mean silhouette over all clustered data points measures how well clusters can be separated. Hereit is measured separately for each feature within the clearly assignable feature vectors from which the B and NB seedcentroids were derived. It is minmax-normalized to the range [0 1].11 .3.6 Output

The output consists of a segment tier for each channel with the name fsys:glob:tier out stm + channelIndex. Eachsegment spans the interval between two subsequent B events. If fsys:glob:tier is a segment tier, then pauses aretaken over from this tier. Standard labels ’x’ are assigned to the prosodic phrase segments.

Pitch accents are derived in an analogous bootstrap fashion as prosodic boundaries. The user needs to specify anevent tier (default: syllable nuclei) for localization of the pitch accent candidates. Furthermore the user can specify asegment tier (e.g. words) to restrict the maximum number of detected pitch accents within each segment to 1. fsys:augment:loc:tier accfsys:augment:loc:tier ag

Given a segment tier, the user can furthermore specify (1) whether each segment should get an accent or only theprominent ones augment:loc:ag select and (2) where within a segment an accent should be placed: left- or rightmost, e.g. for prosodically left- orright-headed languages, or on the most prominent candidate. augment:loc:acc select

Prominence can be parameterized by several feature sets measuring standard f0 and energy features, contour shapeswithin local segments and their deviation from a global declination trend.The user can select whether the current feature values at time i , v i , or the delta values (i.e. the diﬀerences to thepreceding values v i − v i − ) or both should be taken: augment:loc:measure Some features require units from a parent tier which is to be speciﬁed by augment:loc:tier parent , e.g. tomeasure local f0 deviations relative to some superordinate unit and to limit analysis and normalization windows. Suchunits are e.g. prosodic phrases derived from preceding phrase extraction. Fallback is the entire ﬁle.From these features for each of the two classes accented A and not accented NA a centroid can be bootstrapped inseveral ways analogously to the prosodic boundary extraction, this time given the speciﬁcation in augment:loc:cntr mtd .Centroids can be calculated separately for each ﬁle or over the entire data set by setting the value of augment:loc:unit to ﬁle or batch , respectively. The latter is strongly recommended for corpora containing lots of short recordings. augment:loc:cntr mtd=splitaugment:loc:prct=mySplitPoint Given a user-deﬁned feature set where for each feature high values indicate prominence A and NA centroids can bestraight-forwardly derived from high and low feature values, respectively. Centroids are thus derived by splitting eachcolumn in the feature matrix at the percentile augment:loc:prct . The A centroid is deﬁned by the median of thevalues above the splitpoint, the NA centroid by the median of the values below. All feature vectors are then assignedto the nearest centroid in a single pass. Boundaries are then inserted at all candidate time points classiﬁed as B. Thismethod works for both segment and event tier input. augment:loc:cntr mtd=seed kmeansaugment:loc:max l na=myMaxLengthNAaugment:loc:min l a=myMinLengthAaugment:loc:min l=myMinLengthAG

This procedure works only if a segment tier is provided next to the event tier, and if this segment tier contains word-like units. As for the phrase boundary detection described above there are 2 (this time even more) simplifying assump-tions to derive seed centroids for cluster initialization (cf. Figure 2): (1) each word longer than augment:loc:min l a contains an accent, due to its expected high information content. (2) each word shorter than augment:loc:max l na does not contain an accent due to its expected low information content. Depending on augment:loc:acc select theA centroid is then calculated from all leftmost, rightmost, or most prominent tier acc candidates in the tier ag seg-ments fulﬁlling criterion (1). The NA centroid is calculated from all tier acc candidates in in the tier ag segmentsfulﬁlling criterion (2). KMeans clustering is then initialized by these two centroids and subdivides all candidatesinto the A and NA cluster. Multiple A cases within the same segment are reduced by augment:loc:acc select .Furthermore, among A cases closer than augment:loc:min l only the more prominent ones are kept.12igure 2: Bootstrapping seed centroids for the classes 1 (accent) and 0 (no accent). Word boundaries are indicatedby long vertical lines, and syllable nuclei by short vertical lines. Prominence is encoded by the size of the triangles.Assumptions: each word longer than some threshold contains an accent (green); each word shorter than some thresholddoes not contain an accent (blue). Within the accented word the accent is placed on the most prominent syllable (asin this example), or on the left- or rightmost syllable. augment:loc:cntr mtd=seed prctaugment:loc:prct=mySplitPointaugment:loc:max l na=myMaxLengthNAaugment:loc:min l a=myMinLengthA

The seed centroid bootstrapping works as for the preceding method. Instead of kMeans, for the remaining featurevectors the Euclidean distance to the NA seed centroid is calculated. Vectors with a distance above the mySplitPoint -thpercentile of all measured distances are assigned to the A class, the others to the NA class.

The percentile split method works with and without segment tiers, whereas the two centroid bootstrapping methodsneed segment tier input next to the event tier to infer word length. As with boundary detection, the parameter augment:loc:prct serves to control for the number of assigned accents. The higher, the smaller the A class, thus thefewer accents will be assigned.As mentioned for prosodic boundary detection, a supporting word segmentation can be derived by precedingsignal-text alignment, e.g. by WebMAUS [25, 9]. augment:loc:wgt:myFeatset+:...

The same selection and weighting mechanisms apply as described in section 7.3.5.The following feature sets can be used: acc, gst, gnl f0, gnl en (see section 10). In section 11 examples are givenhow to expand the corresponding conﬁguration branches.As for boundary detection also for pitch accent detection z-scored vowel length can be added to the feature set.The vowel interval associated to a pitch accent candidate includes the candidate’s time stamp. See section 7.3 forfurther details. The length feature can be added by: augment:loc:wgt:pho=1

The output consists of an event tier for each channel with the name fsys:loc:tier out stm + channelIndex. Standardlabels ’x’ are assigned to each accent.

In the following the f0 preprocessing and the f0 and energy stylization steps are introduced. For each stylization stepit is speciﬁed: navigation: which navigation option to set to True in the conﬁguration ﬁle (see section 11) feature sets: which feature sets result from the stylization (see section 10) option sub-dictionary: which conﬁguration parts serve to customize the respective processing (see section 11) output sub-dictionary: which part of the resulting Python nested dictionary variable contains the extracted feature set (seesection 12.3).

Branches through the conﬁguration as well as trough the result dictionary are referred to by my:branch:to:value .13 .1 F0 preprocessing

F0 preprocessing comprises resampling to 100 Hz, outlier detection, interpolation over outliers and voiceless utteranceparts, smoothing, and semitone conversion including speaker normalization.

Outliers

Outliers are identiﬁed separetely for each channel in a ﬁle. They are deﬁned in terms of deviation from amean value or from the 1st and 3rd quartile. The deviation factor is controlled by preproc:out:f , and the referencepoint by preproc:out:m . For m=mean outliers lie outside the interval [ m − f · sd , m + f · sd]. For m=median outliers lieoutside of [ m − f · iqr , m + f · iqr]. For m=fence outliers lie outside of [ Q − f · iqr , Q + f · iqr] (sd: standard deviation;iqr: interquartile range; Q1, Q3: 1st and 3rd quartile). Interpolation

Only linear interpolation is supported. Horizontal extrapolation is carried out at ﬁle boundaries.

Smoothing

The smoothing method is chosen by preproc:smooth:mtd . Median and Savitzky-Golay ﬁltering aresupported. Median ﬁltering yields smoother contours, while Savitzky-Golay better preserves local f0 maxima andminima. The higher the window length preproc:smooth:win , the more smooth the contours. For the Savitzky-Golayﬁltering the polynomial order needs to be speciﬁed by preproc:smooth:ord . The lower, the more the result getssmoothed away from the input data. navigation: do preproc feature sets: – option sub-dictionary: preproc:* output sub-dictionary: data:myFileIdx:myChannelIdx:f0:* Semitone conversion If preproc:st=1 , Hertz (Hz) values are transformed to semitones (st) as follows: F st =12 · log ( F Hz b ). b is a base value which is calculated separately for each channel in each f0 ﬁle. It is deﬁned as themedian of the values below the percentile preproc:base prct and can be used for f0 normalization by ﬁle and channel.Alternatively, a grouping variable can be speciﬁed, so that for each of its levels a separate f0 base value is calculated.This is done by preproc:base prct grp . There it can be speciﬁed which grouping variable is to be assigned to eachchannel. The grouping variable must be encoded in the ﬁlename and must be extractable from fsys:grp:lab . Anexample: you have stereo f0 ﬁles with the name pattern speakerChannel1 speakerChannel2 . And you want to calculateseparately for each speaker an f0 base value which is the median of the values below the 5th percentile over all thisspeaker’s utterances in the corpus. This is to be conﬁgured as follows: fsys:grp:src=’f0’fsys:grp:sep=’ ’fsys:grp:lab= [ ’speakerChannel1’,’speakerChannel2’ ] preproc:base prct=5preproc:base prct grp:’1’=’speakerChannel1’preproc:base prct grp:’2’=’speakerChannel2’ This assigns to each channel the grouping variable to be read from the f0 ﬁle names. Note, that (1) channel indicesneed to be written in quotation marks, and (2) a shared semantics across the grouping variables is assumed. E.g. justone base value will be calculated for speaker x , regardless whether she was recorded in channel 1 or 2. Base value subtraction If preproc:st is 0, the base value introduced in the preceding paragraph will be subtractedfrom the f0 contour without semitone conversion. If you don’t want to use any base value, neither for subtraction noras conversion reference, set preproc:base prct=0 . The energy contour is simply represented in terms of the root mean squared deviation (RMSD) within the windowedsignal. The relevant parameters can be found below styl:gnl en styl:rhy en:sig . win deﬁnes the window lengthand sts the stepsize. The energy value sample rate is thus 1/ sts . wintyp and winparam give the window type andan additional parameters passed on the get window() function of the scipy.signal module. For customizing energyextraction with other than default values, please consult the scipy.signal documentation for get window() . wintyp and winparam can contain any value speciﬁed in this documentation. Windows serve (1) to transform time stamps from an event tier to segments, and (2) to locally normalize featurevalues. 14 ime stamps to segments

Most feature sets are calculated for segments, not for time stamps. Thus event tierinput is converted to segments by centering a symmetric analysis window with the length preproc:point win on eachtime stamp as shown in Figure 3. Features are then extracted within this window. The window can also be separatelyspeciﬁed for each feature set by preproc:myFeatureSet:point win . For local contour stylization a segment and anevent tier can be processed in parallel as explained in section 8.7.Figure 3: Segment and event tier input. A symmetric analysis window is centered on events. For local contourstylization, segment and event tiers can be integrated for time normalization: the event is set to 0, the pre-event partof the segment to [ − Normalization

For the feature sets loc, gnl f0 and gnl en several feature values are additionally locally normal-ized to capture their relative amount compared to the local environment. This environment length is deﬁned by preproc:nrm win . For event tier input the normalization window is centered on each time stamp. For segment tierinput, it is centered on the midpoint of each segment. For parallel segment and event tier input which can be providedfor loc feature extraction, the window is centered on the event’s time stamp within the segment. The window can alsobe separately speciﬁed for each feature set by preproc:myFeatureSet:nrm win .Figure 4: Analysis and longer normalization window. The values derived in the analysis window are divided by thecorresponding values in the normalization window.

Window constraints

Analysis and normalization window are limited to the corresponding segment in the parenttier domain. For loc features this domain is given by the global segment tier. For the other features it is given by thespeech chunk tier if this tier is deﬁned in fsys:chunk:tier . This means that analysis and normalization is not carriedout across global segments or chunks, respectively. An exception can be made for the bnd feature set, that might bemeaningful for chunk boundaries, too. If so, styl:bnd:cross chunk is to be set to 1. For segment tier input theminimum length of the normalization window is set to the length of the respective segment. This implies that forsegments longer than the deﬁned normalization window, normalized feature values are the same as the not normalizedones. navigation: do preproc feature sets: – option sub-dictionary: preproc:* output sub-dictionary: data:myFileIdx:myChannelIdx:. . . : { t | to | tn } .4 Superposition The core concept of CoPaSul is to represent an f0 contour as a superposition of linear global component and polynomiallocal components as shown in Figure 5.Figure 5: Superposition of one global and four local contours.Stylization is carried out as follows: Within each global segment of the tier fsys:glob:tier (e.g. an intonationphrase) a linear register level and range representation is ﬁtted. After subtraction of this global component, withineach local segment an n-th order polynomial is ﬁtted to the f0 residual. As an alternative to register level subtraction,the f0 residual can also be derived by normalization of the contour to the register range.

In the annotation ﬁles global segments can be deﬁned in 2 ways:1. by start and end point (segment tier input speciﬁed in fsys:glob:tier )2. by the segments’ right end points (event tier input speciﬁed in fsys:glob:tier that contains e.g. break indexlabels)In the second case the events are expanded to segments between the annotated boundary time stamps. Pausesmarked by an empty label or a pause label ( fsys:label:pau ) are skipped and the onset of the subsequent segmentis set to the end of the pause. Therefore, in point tiers pauses should be marked at their right end. Furthermore, ifchunks are provided by fsys:tier:chunk , then the expanded segments do not cross chunk boundaries but end andstart with the boundaries of the respective chunk they are part of.

Global segments are represented in terms of a time-varying f0 register. Register aspects are level (midline) and range(topline − baseline). Figure 6: Register (level and range) stylization in global contour segments.16 avigation: do styl glob feature sets: glob option sub-dictionary: styl:glob:* output sub-dictionary: data:myFileIdx:myChannelIdx:glob:* The register ﬁtting procedure consists of the following steps: • A window of length styl:glob:decl win is shifted along the f0 contour with a step size of 10 ms. • Within each window the f0 median is calculated – of the values below the styl:glob:prct:bl percentile for the baseline, – of the values above the styl:glob:prct:tl percentile for the topline, and – of all values for the midline.This gives 3 sequences of medians, one for the base-, the mid-, and the topline, respectively. • To each of the three median sequences a linear regression line is ﬁtted. To be able to compare contoursacross global segments of diﬀerent lengths, time is normalized as speciﬁed by styl:glob:nrm:mtd to the range styl:glob:nrm:rng .The motivation for using f0 medians relative to respective percentiles instead of local peaks and valleys is twofold.First, the stylization is less aﬀected by prominent pitch accents and boundary tones. Second, errors resulting fromincorrect local peak detection are circumvented. Both enhances stylization robustness as is shown in [21].The following conﬁguration parameters serve to customize how closely the base- and topline should follow localminima and maxima: styl:glob:prct:blstyl:glob:prct:tlstyl:glob:decl win

A closer ﬁt to local peaks and valleys is achieved by lowering styl:glob:prct:bl and styl:glob:decl win , andby raising styl:glob:prct:tl . Note however, that a closer ﬁt will result in a higher percentage of base- and toplinecrossings. In the resulting Python dictionary such error cases are marked as described here: 12.3.2.From this stylization, regression line slope and intercept features are collected for the base-, mid-, and topline,as well as for the range. For the latter these features are simply derived by ﬁtting a linear regression line throughthe point-wise distances between the base- and the topline. A negative slope means that base- and topline converge,whereas a positive slope signals line divergence.

Global contour classes for analyses on the categorical level are derived by slope clustering. The cluster methodcan be chosen by clst:glob:mtd . If the user expects a certain number of classes, this number can be speciﬁed by clst:glob:kMeans:n cluster . Otherwise, meanShift clustering should be chosen, either as the cluster method, or incombination with kmeans for the sake of centroid initialization. For customizing the clustering settings by non-defaultvalues several parameters are provided whose values are passed on to the respective Python sklearn functions. Theseparameters are named as in sklearn . If needed, please consult the descriptions of the sklearn functions

KMeans,MeanShift , and estimate bandwidth . Figure 7 gives an example for global and local contour classes. navigation: do clst glob feature sets: glob option sub-dictionary: clst:glob:* output sub-dictionary: data:myFileIdx:myChannelIdx:glob:class

Dependent on styl:register the inﬂuence of the global component is removed from the f0 contour in order to derivethe f0 residual for subsequent local contour stylization. If styl:register is set to bl, ml , or tl , then the base, mid, ortopline is subtracted. If the parameter is set to rng , each f0 point is normalized to the local f0 range: the correspondingpoints on the base- and topline are set to 0 and 1, respectively. Thus f0 values between base- and topline are withinthe range [0 1], f0 values below the baseline are <

0, and values above the topline are >

1. For styl:register=none no global component inﬂuence is removed. 17igure 7: Global and local intonation contour classes derived by clustering.

In the annotation ﬁles local segments can be deﬁned in 3 ways:1. by start and end (segment tier input speciﬁed in fsys:loc:tier ag )2. by a center (event tier input speciﬁed in fsys:loc:tier acc )3. by both (segment + event tier input)For case (2) time stamps are transformed to segments by placing a symmetric window of length preproc:point win on each time stamp. In order to be able to compare contours across diﬀerent segment lengths, for (1) and (2) timeis normalized as speciﬁed in styl:loc:nrm . styl:loc:nrm:mtd=minmax yields a minmax time normalization to therange styl:loc:nrm:rng .For (3) the time stamp within the segment is treated as the zero-center, that is, time is [ − fsys:loc:tier ag are considered for feature extraction to which at least onecenter is assigned in tier fsys:loc:tier acc . preproc:loc align serves for a robus treatment of multiple centerassignments. Setting this option to skip segments with more than one center are skipped. By left the ﬁrst center iskept, by right the last one. The f0 residual contour (see section 8.6) in each local segment is stylized by n-th order polynomials. The order isgiven by styl:loc:ord .Figure 8: Local contour stylization by means of a 3rd order polynomial. Time is normalized to the range [0 1].As can be seen in Figure 9 the polynomial coeﬃcients are related to several aspects of local f0 shapes. Given thepolynomial (cid:80) i =0 s i · t i , s is related to the local f0 level relative to the register level. s and s are related to the localf0 trend (rising or falling) and – for annotation cases (2) and (3) – to peak alignment, that is negative values indicate18arly, and positive values late peaks. s determines the peak shape (convex or concave) and its acuity: positive s values indicate convex (falling-rising) shapes, negative values concave (rising-falling) shapes, and high values indicatestronger acuity.Figure 9: Inﬂuence of each coeﬃcient of the third order polynomial (cid:80) i =0 s i · t i on the local contour shape. All othercoeﬃcients set to 0. For compactness purpose on the y-axis both function and coeﬃcient values are shown if theydiﬀer. navigation: do styl loc feature sets: loc option sub-dictionary: styl:loc:*; styl:register output sub-dictionary: data:myFileIdx:myChannelIdx:loc:acc:* Local contour classes for analyses on the categorical level are derived by polynomial coeﬃcient clustering. Thecluster method can be chosen by clst:loc:mtd . If the user expects a certain number of classes, this number can bespeciﬁed by clst:loc:kMeans:n cluster . Otherwise, meanShift clustering should be chosen, either as cluster method, orin combination with kmeans for the sake of centroid initialization. For customizing the clustering settings by non-default values several parameters are provided whose values are passed on to the respective Python sklearn functions.These parameters are named as in sklearn . If needed, please consult the descriptions of the sklearn functions

KMeans,MeanShift , and estimate bandwidth . Figure 7 gives an example for global and local contour classes. navigation: do clst loc feature sets: loc option sub-dictionary: clst:loc:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:class

Standard f0 and energy features are e.g. mean, standard deviation, median, interquartile range, and maximum. Theywill be calculated for the f0 contours for local contour segments. Additionally, the feature values are locally normalizedwithin a window of length preproc:nrm win . See section 8.3 for window length speciﬁcations in dependence of theannotation tier type. navigation: do styl loc ext feature sets: loc option sub-dictionary: styl:gnl *:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:gnl:*

As with global segments, register features can also be extracted for local features exactly the same way as introducedin section 8.5.2. navigation: do styl loc ext feature sets: loc option sub-dictionary: styl:glob:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:decl:* .7.6 Gestalt features Gestalt features quantify the deviation of the local contour register from the global contour register as shown in Figure10. For this purpose the register properties of the local segment are compared with the properties of the dominatingglobal segment in terms of root mean squared deviations and slope diﬀerences. For each register representation (base-,mid-, topline, and range regression line), the RMSD between the local and global declination line is calculated. Thehigher these values, the more the local contour sticks out from the global contour, which is of relevance for studies onprominence, accent group patterns [2], and prosodic headedness [22, 23].Figure 10: Gestalt stylization: Deviation of the local contour register aspects (base, mid, topline, range) from theglobal contour register.The inherent Gestalt properties of the local contours are represented again in terms of polynomial coeﬃcients. Forthis purpose polynomials of n-th order speciﬁed by styl:loc:ord are ﬁtted to all supported kinds of f0 residuals:subtraction of base-, mid-, and topline, and range normalization. This yields 4 coeﬃcient vectors, one for each residual. navigation: do styl loc ext feature sets: loc option sub-dictionary: styl:loc:* output sub-dictionary: data:myFileIdx:myChannelIdx:loc:gst:*

Standard features are e.g. mean, standard deviation, median, interquartile range, and maximum. They will becalculated for f0 and energy contours over the entire ﬁle and for segments in an arbitrary number of annotationtiers speciﬁed in fsys:gnl f0:tier and fsys:gnl en:tier , respectively. For event tiers, the segments are given bycentering an analysis window of length preproc:point win on the time stamps. Additionally, the feature values arelocally normalized within a window of length preproc:nrm win . See section 8.3 for window length speciﬁcations independence of the annotation tier type. Furthermore, f0 and energy quotients are calculated between the mean valuesderived in contour initial and ﬁnal windows and in the respective remainder part of the contour. The length of thiswindow is speciﬁed by styl:gnl:win . Finally, a second order polynomial is ﬁtted through the f0 or energy contour,for which time is normalized to the range [0 1]. navigation: do styl gnl f0 feature sets: gnl f0, gnl f0 ﬁle option sub-dictionary: styl:gnl f0:* output sub-dictionary: data:myFileIdx:myChannelIdx:gnl f0:*, data:myFileIdx:myChannelIdx:gnl f0 ﬁle:*

An additional standard feature for energy only is spectral balance. It is realized as the SPLH–SPL measure, i.e. thesignal’s sound pressure level subtracted from the level after pre-emphasis. Pre-emphasis can be carried out in the timeof frequency domain styl:gnl en:sb:domain . The latter is implemented as proposed by [5]. In the time domainpre-emphasis is calculated as follows: s (cid:48) [ i ] = s [ i ] − α · s [ i − α is set by styl:gnl en:sb:alpha and determines thelower frequency boundary for pre-emphasis by 6dB per octave. 0.95 roughly corresponds to 150 Hz; the smaller thevalue for α , the higher the lower boundary. Alternatively, α can be set directly to the lower frequency boundary F and will be internally transformed to α = e − · π · F · ∆ t . Note that pre-emphasis in the time domain usually leads to anoverall lower energy so that SPLH–SPL will be negative. 20n the frequency domain pre-emphasis is carried out according to [5] by adding 10 · log ((1 + f ) / (1 + f )) tothe logarithmic spectrum.The spectral balance calculation can be restricted to a speciﬁed time and/or frequency window. The time windowlength is speciﬁed by styl:gnl en:sb:win to cut out the center of that length of the segment to be analysed. It servesto reduce the inﬂuence of coarticualtion on the results. High-, low- or band-pass cutoﬀ frequencies ( styl:gnl en:sb:f ;ﬁlter type: styl:gnl en:sb:btype ) might be used to limit the analysis to a speciﬁed frequency-band (e.g. an uppercutoﬀ frequency 5000 Hz for vowels). navigation: do styl gnl en feature sets: gnl en, gnl en ﬁle option sub-dictionary: styl:gnl en:* output sub-dictionary: data:myFileIdx:myChannelIdx:gnl en:*, data:myFileIdx:myChannelIdx:gnl en ﬁle:* Boundaries are parameterized in terms of discontinuity features of several register representations. Details and anapplication for perceived prosodic boundary strength prediction can be found in [21].Boundary features can be extracted for any number of segment or event tiers speciﬁed by fsys:bnd:tier . Featurescan be extracted for:1. navigate:do styl bnd : each adjacent segment pair. For event tiers, segments are deﬁned as the intervalsbetween two time stamps. Note that this implies, that pause length is only available for segment tier input,where it is deﬁned as the gap between the second segment’s starting point and the ﬁrst segment’s endpoint.2. navigate:do styl win : ﬁxed time windows. For segment tiers, the pre- and post-boundary units are not givenby the adjacent segments, but by windows of ﬁxed length. For event tiers the window halfs of preproc:point win centered on a time stamp are considered as pre- and post-boundary units.3. navigate:do styl trend : pre- and post-boundary units, that range from the current chunk start to the bound-ary, and from the boundary to the chunk end. If no chunking available, the ﬁle start and endpoint are taken.For cases (2) and (3) holds: If styl:bnd:cross chunk is set to 0, and if a chunk tier is given by fsys:tier:chunk ,the analyses windows are limited by the start and endpoint of the current chunk.A boundary is parameterized in terms of pause length (for segment tier input only) and pitch discontinuities. Forthe latter, register features (as described in section 8.5.2) are extracted three times: for the pre-boundary segment, forthe post-boundary segment, and for the concatenation of both segments. Figure 11 illustrates the threefold registerstylization for the pre- and post-boundary as well as for the concatenated segment. Figure 12 shows, how discontinuityfor each of the register lines is expressed. Let seg , seg be the pre- and post-boundary segments, and seg theirconcatenation. Then discontinuity is given by: • the RMSD between the four register representations of seg and the corresponding part of seg . The registerrepresentations are base-, mid-, topline, and range regression line. • the RMSD between the register representations of seg and the corresponding part of seg • the RMSD between the register representations of seg and seg opposed to seg • the reset d , i.e. the diﬀerence between the initial value of the regression line in seg and the ﬁnal value of theregression line in seg • the onset diﬀerence of the regression lines d o , i.e. the initial value of the seg regression line subtracted fromthe initial value of the seg line • the diﬀerence of the regression line mean values d m , the seg mean being subtracted from the seg mean. Both d o and d m could be used to measure downstep. • the pairwise slope diﬀerences s ∗ between the 3 regression lines: for s the seg is subtracted from the seg slope. For s

12 1 and s

12 2 the slopes of seg and seg are subtracted from the seg slope. • the correlation-based distances between the ﬁtted lines calculated for the same combinations as the RMSD valuesabove. Pearson r correlations are turned into distance d values ranging from 0 to 1 by d = − r . • the quotient of RMS errors between stylization input (the respective sequence of medians) and output (the ﬁttedlines). The error of the joint stylization is divided by the error from the single pre- and post boundary ﬁts. Thequotient is reported separately for the entire, the pre-boundary, and the post-boundary segment.21 the increase of the Akaike information criterion (AIC) resulting from one joint vs two separate ﬁts. The AICdoes not only account for the ﬁtting error but also for the number of model parameters. The lower its value, thebetter the model. For least squares ﬁt comparisons the AIC can be calculated as: 2 · k + n · ln RSS. k denotesthe number of model parameters, n the number of stylization input values, and RSS the residual sum of squares.To each ﬁtted line 3 parameters are assigned: intercept, slope, and Gaussian noise variation. The AIC increaseis measured by subtracting the single line ﬁt AIC from the joint ﬁt AIC. It is reported separately for the entire,the pre-boundary, and the post-boundary segment.All features are calculated 4 times, for the base-, mid- and toplines, as well as for the range regression lines.All but the reset and the slope diﬀerence variables are positively related to discontinuity. The user might want toreplace the reset and slope diﬀerences by their absolute values.In the styl:bnd option sub-dictionary nrm, decl win , and prct have the same purpose right as in the styl:glob context, see section 8.5.2. styl:bnd:win speciﬁes the window length of seg for window case (2).Figure 11: Prosodic boundaries: threefold base-, mid-, and topline register stylization for the pre-boundary, post-boundary, and the concatenated segment.Figure 12: Boundary features describing reset and deviation from a common trend. In this case features are extractedat a word boundary wrd-bnd . The 3 regression lines can refer to f0 baselines, midlines, toplines, and to range. Thesame features are outputted for these 4 register aspects.The boundary feature extraction can be carried out on the (preprocessed) f0 contour or on the f0 residual bysetting styl:bnd:residual to 0 or 1, respectively. The former should be used if boundaries between global segmentsas intonation phrases are examined. The residual might be used if the user is interested in boundaries between e.g.accent groups within the same global segment. Note that for residuals the boundary examination across global segmentsmight not be meaningful, since at these boundaries the residuals are derived from diﬀerent register regression lines.These cases can be identiﬁed in the output by means of the is ﬁn column (see section 12.1). The residual calculationis described in section 8.6. Running boundary stylization on residuals requires a previous global contour stylization,i.e. styl:navigate:do styl glob needs to be set to 1.The subsequent paragraphs name the conﬁguration branches associated to the stylization cases (1)–(3), respectively. navigation: do styl bnd eature sets: bnd option sub-dictionary: styl:bnd:* output sub-dictionary: data:myFileIdx:myChannelIdx:bnd:std:* navigation: do styl bnd win feature sets: bnd option sub-dictionary: styl:bnd:* output sub-dictionary: data:myFileIdx:myChannelIdx:bnd:win:* navigation: do styl trend feature sets: bnd option sub-dictionary: styl:bnd:* output sub-dictionary: data:myFileIdx:myChannelIdx:bnd:trend:* Rhythm features can be extracted for any number of segment or event tiers speciﬁed by fsys:rhy *:tier , * rep-resenting f0 and en for the f0 and the energy contour, respectively. Time stamps of event tiers are transformed tosegments as introduced in section 8.3.Rhythm measures consist of: • spectral moments of a DCT analysis of the contour • the number of peaks in the absolute-value DCT spectrum • the frequency associated with the highest peak • event rates within the analyzed segment • the inﬂuence of these events on the f0 or energy contour within the analyzed segmentTo extract the relative weight of the low- and high-frequency components of a contour, a discrete cosine transform(DCT) is applied on the contour as in [7]. For the absolute DCT coeﬃcient values the ﬁrst n rhy *:rhy:nsm spectralmoments are calculated that (up to the forth moment) give the mean, variance, skew, and kurtosis of the DCTcoeﬃcient weight distribution, repsectively.Before applying the DCT the contour is weighted by the two parameters rhy *:rhy:wintyp and rhy *:rhy:winparam as introduced in section 8.2.The events (time stamps or segments) for which rate and inﬂuence is to be calculated are read from one or moretier names in fsys:rhy *:tier rate . Thereby within each recording channel each analysis tier in fsys:rhy *:tier is combined with each rate tier in fsys:rhy *:tier rate . Rate is simply measured by counting the events, thatfall within the segment of analysis, and dividing it by the length of the analyzed segment. For segment tiers in fsys:rhy *:tier rate only proportions included in the segment of analysis are added to the count.The inﬂuence s of events on the f0 or energy contour is quantiﬁed as the relative weight of the DCT coeﬃcientsaround the event rate r (+ / − rhy *:rhy:wgt:rb Hz) within all coeﬃcients between rhy *:rhy:lb and rhy *:rhy:ub

Hz as follows: s = (cid:80) c : r − ≤ f ( c ) ≤ r +1 Hz | c | (cid:80) c : lb ≤ f ( c ) ≤ ub Hz | c | The higher s the higher thus the inﬂuence of the event rate on the f0 or energy contour. Figure 13 compares a lowevent rate with a high impact on the energy contour with a high event rate with low impact (high vs low absolutecoeﬃcient values).The relative weight is outputted to the feature table’s columns myRateTier prop (see sections 10 and 12.1). myRateTier refers to each entry in fsys:rhy *:tier rate . The respective analysis tiers from fsys:rhy *:tier are displayed in the tier column. The proportion is outputted for each segment in the analysis tiers.Additionally, the rate of rate tier events in each analysis tier segment is provided by myRateTier rate . Finally, myRateTier mae gives the mean absolute error between the original contour and the inverse cosine transform out-put that is based on the coeﬃcients with frequencies around the event rates. The following paragraphs name theconﬁguration branches responsible for the rhythmic analyses of the f0 and energy contour, respectively. myRateTier * parameters are not calculated for analysis/rate tier combinations across recording channels. Thatis: Given are analysis tier TA1 and rate tier

RT2 refering to channels 1 and 2, respectively. Then cells in the

RT2 * columns are set to NA in all TA1 rows, which are identiﬁed by the tier column.The number of peaks n peak in the DCT spectrum is derived by counting the local amplitude maxima in thisspectrum among the values greater or equal than the amplitude related to the center of gravity.23igure 13: Inﬂuence of events on a contour in terms of the relative weight of the DCT coeﬃcients around the eventfrequency. navigation: do styl rhy f0 feature sets: rhy f0, rhy f0 ﬁle option sub-dictionary: styl:rhy f0:* output sub-dictionary: data:myFileIdx:myChannelIdx:rhy f0:*; data:myFileIdx:myChannelIdx:rhy f0 ﬁle:*

The energy contour extraction in the analyzed segment is controlled by the styl:rhy en:sig:* sub-dictionary thesame way as explained in section 8.2. navigation: do styl rhy en feature sets: rhy en, rhy en ﬁle option sub-dictionary: styl:rhy en:* output sub-dictionary: data:myFileIdx:myChannelIdx:en f0:*; data:myFileIdx:myChannelIdx:rhy en ﬁle:*

Voice quality features can be extracted for any number of segment or event tiers speciﬁed by fsys:voice:tier . Timestamps of event tiers are transformed to segments as introduced in section 8.3. At the current state voice measuresconsist of: • jitter, • shimmer, • • relative local jitter as the mean absolute diﬀerence between adjacentperiods divided by the overall mean period. As for Praat the following parameters can be speciﬁed in styl:voice:jit . t min and t max refer to the minimum and maximum allowed period durations, and fac max to the maximally allowedquotient of adjacent periods. Periods not fulﬁlling these constraints are discarded from calculation.Shimmer again is calculated the same way as Praat does for the Shimmer (local) parameter, i.e. it is the meanabsolute diﬀerence between the amplitudes of adjacent periods, divided by the average amplitude.For both jitter and shimmer a 3rd order polynomial is ﬁtted through the obtained sequence of distance values ofadjacent periods each distance divided by the average period, resp. amplitude. Time is normalized to the interval -1to 1. The purpose of these polynomials is to represent the changes of jitter and shimmer over time. As an examplea negative 1st order coeﬃcient for the jitter sequence indicates a decrease in jitter over time (see Figure 9 for theinterpretation of the coeﬃcients).The conﬁguration branches related to the voice feature set are: navigation: do styl voice feature sets: voice, voice ﬁle option sub-dictionary: styl:voice:* output sub-dictionary: data:myFileIdx:myChannelIdx:voice:*; data:myFileIdx:myChannelIdx:voice ﬁle:* All features are subdivided into the following sets which can be extracted independently of each other. In the subse-quent listing * file indicates that there is an additional feature extraction on the entire ﬁle level with minor deviationsfrom the extraction on smaller domains (e.g. missing normalization). • gnl f0, gnl f0 ﬁle: general standard f0 features as mean, median, standard deviation, interquartile range; forany number of tiers • gnl en, gnl en ﬁle: general standard energy features as mean, median, standard deviation, interquartile range;for any number of tiers • glob: register (level and range) features in larger domains (e.g. intonation phrases); for one tier per channel • loc: shape features in smaller domains (e.g. accent groups) of f0 residuals (after removal of global f0 aspects).Gestalt features, i.e. deviation of accent groups from intonation phrases. This feature set requires the precedentextraction of the glob set; for one tier per channel • bnd, bnd win, bnd trend: boundary features between adjacent segments in the same domain. For bnd thefeatures are derived from the stylization of adjacent segments. In bnd win the stylization is carried out inuniform time windows centered on the segment boundaries irrespective of the segment lengths. In bnd trend the stylization is carried out from the beginning of a speech chunk to the boundary in question, and from thisboundary to the end of the chunk; for any number of tiers • rhy f0, rhy f0 ﬁle: DCT-based rhythm features; rates of prosodic events (e.g. syllable nuclei, pitch accents)and their inﬂuence on the f0 contour; for any number of tiers • rhy en, rhy en ﬁle: DCT-based rhythm features; rates of prosodic events and their inﬂuence on the energycontour; for any number of tiers • voice, voice ﬁle: voice quality features as jitter and shimmer: mean values and polynomial stylization of theirchanging over time Application examples for these feature sets areapplication feature setspitch accent prototypes for information status and discourse segmentation [14] glob, locprosodic boundary strength prediction [21] bndprosodic typology [22, 23] locempirical evidence for prosodic constituents (accentual phrases) [2, 23] locinterplay of phrasing and prominence [24] loc, bnd, gnl en, gnl f0dialog act prediction [12] glob, loc, gnl f0personality trait prediction [15] glob, loc, gnl f0infant-directed speech [11] glob, loc, gnl f0, gnl enentrainment [18, 17] glob, locoﬀtalk detection [10] glob, loc, gnl en, gnl f0speech disﬂuencies [1] locpitch accent inventory for low-resource languages [8] locLombard speech characteristics [3] bndSocial media analyses [19] bndHand-stroke–speech coordination [6] rhy en, rhy f0The following tables list all currently available features in alphabetical order, give short descriptions and link themto the respective feature set. In these tables loc and glob within the superpositional setting refer to local (e.g. accentgroups) and global segments (e.g. intonation phrases), respectively. For boundary parameterization pre, post, joint refer to the pre- and post-boundary segments, and to their concatenation, respectively. For boundary features std,win , and trend refer to the underlying windowing of neighboring segments, cf. section 8.9. The number of coeﬃcientand spectral moment variables c* and sm* depend on the polynomial order and spectral moment number speciﬁedby the user. For the rhy * feature sets myAnalysisTier stands for the analysis tier, and myRateTier for the rate tier,i.e. the rate and inﬂuence of events in myRateTier within segments of myAnalysisTier is measured, and all possiblecombinations of analysis and rate tiers are outputted. 25 ame description feature set bl c0 baseline intercept glob, locbl c1 baseline slope glob, locbl d ﬁn ﬁnal baseline value diﬀ loc-glob locbl d init initial baseline value diﬀ loc-glob locbl m baseline mean value glob, locbl r baseline reset globbl rate baseline declination rate glob, locbl rms baseline RMSD loc-glob locbl sd baseline slope diﬀ loc-glob locbv ﬁle-domain f0 base value (Hz) glob, gnl f0 ﬁlec* polynomial loc contour coef * loc, gnl f0/en( ﬁle)ci channel index (starting with 0) (all sets) class contour class glob, locdur segment duration glob, loc, gnl f0/en( ﬁle), rhy f0/en( ﬁle)dur nrm normalized duration loc, gnl f0/enf max freq of coef with max ampl. in DCT spectrum rhy f0/en( ﬁle)ﬁ ﬁle index (starting with 0) (all sets) gi si value of corresponding row in glob lociqr f0 interquartile range glob, loc, gnl f0/en( ﬁle)iqr nrm nrm’d f0 interquartile range loc, gnl f0/enis ﬁn item in global segment’s ﬁnal position? (all sets w/o * ﬁle) is ﬁn chunk item in chunk ﬁnal position? (all sets w/o * ﬁle) is init item in global segment’s initial position? (all sets w/o * ﬁle) is init chunk item in chunk initial position? (all sets w/o * ﬁle) jit jitter voice( ﬁle)jit c* polynomial coefs for jitter time course voiceshim shimmer voice( ﬁle)shim c* polynomial coefs for shimmer time course voice( ﬁle)lab label glob, bnd, gnl f0/en, rhy f0/enlab acc ACC tier label loclab ag AG tier label loclab next next segment’s label bndm f0, energy arit. mean glob, loc, gnl f0/en( ﬁle)m nrm f0, energy arit. nrm’d mean loc, gnl f0/enmax f0, energy max glob, loc, gnl f0/en( ﬁle)max nrm f0, energy nrm’d max loc, gnl f0/enmed f0, energy median glob, loc, gnl f0/en( ﬁle)med nrm f0, energy nrm’d median loc, gnl f0/enml c0 midline intercept glob, locml c1 midline slope glob, locml d ﬁn ﬁnal midline value diﬀ loc-glob locml d init initial midline value diﬀ loc-glob locml m midline mean value glob, locml r midline reset globml rate midline declination rate glob, locml rms midlines RMSD loc-glob locml sd midline slope diﬀ loc-glob locn peak number of peaks in absoulte DCT spectrum rhy f0/en( ﬁle)p pause length (sec) bndqb quotient of means of init and ﬁn part gnl f0/en( ﬁle)qf quotient of means of ﬁnal and non-ﬁn part gnl f0/en( ﬁle)qi quotient of means of initial and non-init gnl f0/en( ﬁle)qm quotient of means max(init, ﬁn) part and remainder gnl f0/en( ﬁle)res bl c* baseline residual poly coef * locres ml c* midline residual poly coef * locres rng c* range line residual poly coef * locres tl c* topline residual poly coef * locrms overall RMSD gnl enrms nrm nrm’d overall RMSD gnl en ng c0 range line intercept glob, locrng c1 range line slope glob, locrng d ﬁn ﬁnal range line value diﬀ loc-glob locrng d init initial range line value diﬀ loc-glob locrng m range mean value glob, locrng r range line reset globrng rate range declination rate glob, locrng rms range lines RMSD loc-glob locrng sd range line slope diﬀ loc-glob locsb spectral balance gnl ensd f0, energy standard deviation glob, loc, gnl f0/en( ﬁle)sd nrm nrm’d f0, energy standard deviation loc, gnl f0, gnl ensi segment index (starting with 0) glob, loc, gnl f0/en, rhy f0/ensm* *th spectral moment of DCT rhy f0/en( ﬁle)std | trend | win bl aicI baseline ﬁtting AIC increase joint vs pre+post bndstd | trend | win bl aicI post baseline ﬁtting AIC increase joint vs post bndstd | trend | win bl aicI pre baseline ﬁtting AIC increase joint vs pre bndstd | trend | win bl corrD pre/post-joint baseline corr-based distance bndstd | trend | win bl corrD post post-joint baseline corr-based distance bndstd | trend | win bl corrD pre pre-joint baseline corr-based distance bndstd | trend | win bl d m diﬀerence of baseline means pre–post bndstd | trend | win bl d o diﬀerence of baseline onsets pre–post bndstd | trend | win bl r pre-post baseline reset bndstd | trend | win bl rms pre/post-joint baseline RMSD bndstd | trend | win bl rms post post-joint baseline RMSD bndstd | trend | win bl rms pre pre-joint baseline RMSD bndstd | trend | win bl rmsR baseline ﬁtting error ratio joint vs pre+post bndstd | trend | win bl rmsR post baseline ﬁtting error ratio joint vs post bndstd | trend | win bl rmsR pre baseline ﬁtting error ratio joint vs pre bndstd | trend | win bl sd post baseline slope diﬀ post–joint bndstd | trend | win bl sd pre baseline slope diﬀ pre–joint bndstd | trend | win bl sd prepost baseline slope diﬀ pre–post bndstd | trend | win ml aicI midline ﬁtting AIC increase joint vs pre+post bndstd | trend | win ml aicI post midline ﬁtting AIC increase joint vs post bndstd | trend | win ml aicI pre midline ﬁtting AIC increase joint vs pre bndstd | trend | win ml corrD pre/post-joint midline corr-based distance bndstd | trend | win ml corrD post post-joint midline corr-based distance bndstd | trend | win ml corrD pre pre-joint midline corr-based distance bndstd | trend | win ml d m diﬀerence of midline means pre–post bndstd | trend | win ml d o diﬀerence of midline onsets pre–post bndstd | trend | win ml r pre–post midline reset bndstd | trend | win ml rms pre/post–joint midline RMSD bndstd | trend | win ml rms post post-joint midline RMSD bndstd | trend | win ml rms pre pre-joint midline RMSD bndstd | trend | win ml rmsR midline ﬁtting error ratio joint vs pre+post bndstd | trend | win ml rmsR post midline ﬁtting error ratio joint vs post bndstd | trend | win ml rmsR pre midline ﬁtting error ratio joint vs pre bndstd | trend | win ml sd post midline slope diﬀ post–joint bndstd | trend | win ml sd pre midline slope diﬀ pre–joint bndstd | trend | win ml sd prepost midline slope diﬀ pre-post bndstd | trend | win rng aicI range ﬁtting AIC increase joint vs pre+post bndstd | trend | win rng aicI post range ﬁtting AIC increase joint vs post bndstd | trend | win rng aicI pre range ﬁtting AIC increase joint vs pre bndstd | trend | win rng corrD pre/post-joint range line corr-based distance bndstd | trend | win rng corrD post post-joint range line corr-based distance bndstd | trend | win rng corrD pre pre-joint range line corr-based distance bndstd | trend | win rng d m diﬀerence of range line means pre–post bndstd | trend | win rng d o diﬀerence of range line onsets pre–post bndstd | trend | win rng r pre-post range line reset bndstd | trend | win rng rms std pre/post-joint range line RMSD bndstd | trend | win rng rms post post-joint range line RMSD bndstd | trend | win rng rms pre pre-joint range line RMSD bndstd | trend | win rng rmsR range ﬁtting error ratio joint vs pre+post bndstd | trend | win rng rmsR post range ﬁtting error ratio joint vs post bndstd | trend | win rng rmsR pre range ﬁtting error ratio joint vs pre bndstd | trend | win rng sd post range line slope diﬀ post-joint bndstd | trend | win rng sd pre range line slope diﬀ pre-joint bndstd | trend | win rng sd prepost range line slope diﬀ pre-post bnd td | trend | win tl aicI topline ﬁtting AIC increase joint vs pre+post bndstd | trend | win tl aicI post topline ﬁtting AIC increase joint vs post bndstd | trend | win tl aicI pre topline ﬁtting AIC increase joint vs pre bndstd | trend | win tl corrD pre/post-joint topline corr-based distance bndstd | trend | win tl corrD post post-joint topline corr-based distance bndstd | trend | win tl corrD pre pre-joint topline corr-based distance bndstd | trend | win tl d m diﬀerence of topline means pre–post bndstd | trend | win tl d o diﬀerence of topline onsets pre–post bndstd | trend | win tl r std pre-post topline reset bndstd | trend | win tl rms pre/post-joint topline RMSD bndstd | trend | win tl rms post post-joint topline RMSD bndstd | trend | win tl rms pre pre-joint topline RMSD bndstd | trend | win tl rmsR topline ﬁtting error ratio joint vs pre+post bndstd | trend | win tl rmsR post topline ﬁtting error ratio joint vs post bndstd | trend | win tl rmsR pre topline ﬁtting error ratio joint vs pre bndstd | trend | win tl sd post topline slope diﬀ post-joint bndstd | trend | win tl sd pre topline slope diﬀ pre-joint bndstd | trend | win tl sd prepost topline slope diﬀ pre-post bndstm f0 ﬁle name stem glob, loct oﬀ time oﬀset (sec; bnd: of pre-boundary segment) glob, loc, gnl f0/en, rhy f0/en, bndt on time onset (sec; bnd: of post-boundary segment) glob, loc, gnl f0/en, rhy f0/en, bndtier tier name bnd, gnl f0/en, rhy f0/entl c0 topline intercept globtl c1 topline slope globtl d ﬁn ﬁnal topline value diﬀ loc-glob loctl d init initial topline value diﬀ loc-glob loctl m topline mean value glob, loctl r initial topline reset globtl rate topline declination rate glob, loctl rms topline RMSD loc-glob loctl sd topline slope diﬀ loc-glob locmyRateTier dgm diﬀerence between rate and frequency of max amplitude coef rhy f0/en( ﬁle)myRateTier dlm diﬀerence between rate and frequency of nearest peak coef rhy f0/en( ﬁle)myRateTier mae meanAbsErr(IDCT(myRateTier),contourOfFile) rhy f0/en( ﬁle)myRateTier prop inﬂuence of myRateTier on DCT coefs rhy f0/en( ﬁle)myRateTier rate event rate of myRateTier rhy f0/en( ﬁle)

11 Conﬁgurations

The conﬁguration ﬁle format is

JSON . Examples can be found in the conﬁg subfolder of the code distribution. copa-sul default conﬁg.json contains all default values. In the doc subfolder you ﬁnd the ﬁle copasul commented conﬁg.json.txt where all options are commented for a quick overview. In the following detailed introduction of all conﬁguration pa-rameters, the levels of the JSON dictionary are separated by a colon.For numeric and boolean parameters the “values, default” ﬁeld contains the default value. For string parameters,the default value is indicated in bold face. If a conﬁguration ﬁeld is named as my* the name is user deﬁned. +indicates “one or more” conﬁguration branches of this kind. Example: fsys:channel:myTiername+ indicates, thatthe user needs to specify for all tiers in the annotation ﬁles, to which audio channel they belong. Let’s assume thereare two tiers spk1 and spk2 , the ﬁrst belongs to channel 1, the second to channel, two, then fsys:channel:spk1=1 and fsys:channel:spk2=2 . fsdescription: f0 sample frequency type: integer values, default: remarks: currently only fs=100 supported. All f0 input will be resampled to this sample rate Automatic annotation steps can be carried out independently of each other as long they don’t depend on the output ofpreceding annotation steps, e.g. if fallback events as syllable boundaries and nuclei are required for phrase boundaryand accent detection, or if parent segments are deﬁned to be the result of preceding automatic clustering or prosodicphrasing. Figure 14 displays the possible augmentation pipelines.28hunksylgloblocFigure 14: Automatic annotation do augment * workﬂow

Pipelines are deﬁned in the navigate conﬁgurations. Processing step dependencies are shownin Figure 15. preprocglob locclst locclst glob gnl f0 gnl en bnd bnd win bnd trend rhy f0 rhy en voiceexportFigure 15: Stylization do styl and clustering do clst workﬂowProcessing does not always need to start from scratch. Intermediate feature extraction results are stored in Pythonpickle format and can be reloaded for further processing in a later session. The name of the pickle ﬁle to be loaded isgiven in fsys:export:dir + fsys:export:stm In order to continue an analysis of a previous session, the user thus needs to make sure that output directory andﬁle name stem do not change across sessions. The content of the ﬁle can be deleted by setting navigate:from scratch to 1. This and all other navigate conﬁguration elements are introduced in the following: navigate:do augment chunkdescription: apply automatic chunking into interpausal units type: boolean values, default: remarks: If 1, a chunk segment tier is generated for each channel and added to the annotation ﬁles. navigate:do augment globdescription: apply unsupervised prosodic phrase extraction type: boolean values, default: emarks: If 1, for each channel a segment tier with automatically extracted prosodic phrases is generated and added to theannotation ﬁles. If no input tier for prosodic boundary candidates is speciﬁed, this step requires preceding syllable extraction,since syllable boundaries will then be taken as candidates. navigate:do augment locdescription: apply unsupervised pitch accent detection type: boolean values, default: remarks: If 1, for each channel an event tier with automatically extracted pitch accent locations is generated and added tothe annotation ﬁle. If no user-deﬁned pitch accent candidates can be provided, this step requires preceding syllable nucleusextraction, which will then be taken as candidates. navigate:do augment syldescription: apply automatic syllable nucleus and boundary detection type: boolean values, default: remarks: If 1, for each channel two event tiers – a syllable nucleus and boundary tier – are generated and added to theannotation ﬁles. navigate:do clst globdescription: apply local contour clustering type: boolean values, default: remarks: cluster local contour polynomial coeﬃcients to derive local intonation contour classes. navigate:do clst locdescription: apply global contour clustering type: boolean values, default: remarks: cluster global contour line slope coeﬃcients to derive global intonation contour classes. navigate:do exportdescription: export the results type: boolean values, default: remarks: generate csv feature table ﬁles, and f0 table ﬁles navigate:do plotdescription: plot type: boolean values, default: remarks: online or post-analysis plotting of stylization results. Online plotting serves to check the parameter settings beforeprocessing large data. navigate:do preprocdescription: apply preprocessing type: boolean values, default: remarks: F0 preprocessing and analysis and normalization windowing. If set to 1 at non-initial application to a data set,all information previously gathered from subsequent stylization steps is deleted. navigate:do styl bnd trenddescription: extract boundary features type: boolean values, default: remarks: Extract f0 discontinuity features at each segment boundary or time stamp. This time the pre- and post-boundaryunits range from ﬁle start to the boundary, and from the boundary to the ﬁle end. If styl:bnd:cross chunk is set to 0, andif a chunk tier is given in fsys:chunk:tier , the analyses windows are limited by the start and endpoint of the current chunk. navigate:do styl bnd windescription: extract boundary features in ﬁxed time windows type: boolean values, default: remarks: Extract f0 discontinuity features. For segment tiers, the pre- and post-boundary units are not given by theadjacent segments as for navigate:do styl bnd , but by windows of ﬁxed length. For event tiers the window halfs of preproc:point win centered on a time stamp are considered as pre- and post-boundary units. If styl:bnd:cross chunk isset to 0, and if a chunk tier is given in fsys:chunk:tier , the analyses windows are limited by the start and endpoint of thecurrent chunk. avigate:do styl bnddescription: extract boundary features type: boolean values, default: remarks: Extract f0 discontinuity features across segments (segment tier input) or at time stamps (event tier input). Onlyfor the former the extracted pause length is meaningful. Discontinuity is amongst others expressed in the deviation of thepre- and post-boundary part from a common declination trend. For segment tiers, this common trend is calculated over bothsegments. For event tiers, the inter-time stamp intervals are considered as segments. navigate:do styl globdescription: apply global contour stylization type: boolean values, default: remarks: Apply f0 register (level and range) stylizations within global segments as e.g. IPs. navigate:do styl gnl endescription: extract standard energy features type: boolean values, default: remarks: Extract energy mean, variance and the like. navigate:do styl gnl f0description: extract standard f0 features type: boolean values, default: remarks: Extract f0 mean, variance and the like. navigate:do styl loc extdescription: extract extended feature set for local f0 contours type: boolean values, default: remarks: Extract local register and Gestalt features, i.e. deviation of the local contour from the global register trend. navigate:do styl locdescription: apply local contour stylization type: boolean values, default: remarks: Apply polynomial f0 contour stylization in local segments as e.g. AGs. navigate:do styl rhy endescription: extract energy rhythm features type: boolean values, default: remarks: apply DCT analyses on energy contour within user-deﬁned segments and calculate the inﬂuence of events on thecontour, in terms of the relative weight of DCT coeﬃcients navigate:do styl rhy f0description: extract f0 rhythm features type: boolean values, default: remarks: apply DCT analyses on f0 contour within user-deﬁned segments and calculate the inﬂuence of events on thecontour, in terms of the relative weight of DCT coeﬃcients navigate:do styl voicedescription: extract voice quality features type: boolean values, default: remarks: extract jitter and shimmer navigate:from scratchdescription: start from scratch type: boolean values, default: remarks: If 1, all conﬁgurations and analyses results in the pickle ﬁle are overwritten. navigate:overwrite conﬁgdescription: overwrite stored conﬁgurations ype: boolean values, default: remarks: If 1, the conﬁguration stored in the pickle ﬁle is overwritten by the current user-deﬁned setting. Useful, if e.g.selected analysis steps should be repeated by diﬀerent preprocessing settings.

There are the following dependencies among the processing steps: • all do styl* steps require preceding do preproc • do styl loc requires preceding do styl glob • do styl bnd requires preceding do styl glob if the boundary features are to be extracted from the f0 residuals. • all do clst* steps require a preceding do styl* step of the same type ( loc or glob )If the preprocessing step navigate:do preproc is repeated, all already extracted features are deleted since theupdated preprocessing conﬁguration might lead to diﬀerent stylization results. Thus by repeating this step the userneeds to redo all subsequent stylizations. fsys:annot:dirdescription: annotation ﬁle directory type: string values, default:remarks: Can be nested. Depending on the task, audio, f0, and annotation ﬁles are obligatory or not. All obligatorydirectories must contain the same number of ﬁles in the same order. Optimally, same order is guaranteed using the same ﬁlename stem for corresponding audio, f0, and annotation ﬁles. However, this is not required. fsys:annot:extdescription: annotation ﬁle extension type: string values, default:

TextGrid, xml remarks: no default fsys:annot:typdescription: annotation ﬁle type type: string values, default:

TextGrid, xml remarks:

Currently, only TextGrid and xml (see section 4.4) are supported. No default. fsys:aud:dirdescription: audio ﬁle directory type: string values, default:remarks:

Can be nested. Depending on the task, audio, f0, and annotation ﬁles are obligatory or not. All obligatorydirectories must contain the same number of ﬁles in the same order. Optimally, same order is guaranteed using the same ﬁlename stem for corresponding audio, f0, and annotation ﬁles. However, this is not required. fsys:aud:extdescription: audio ﬁle extension type: string values, default:remarks:

Only ﬁles with this extension are collected from the directory. fsys:aud:typdescription: audio ﬁle mimetype type: string values, default: wav remarks: currently only wav supported fsys:augment:chunk:tier out stmdescription: tier name stem of chunking output type: string values, default: chunk remarks:

To the name stem the channel index will be added ( also for mono ﬁles! ). E.g. given a stereo ﬁle and fsys:augment:chunk:tier out stm=CHUNK , the two segment tiers

CHUNK 1 and

CHUNK 2 will be generated for channel 1and 2, respectively. fsys:augment:glob:tier out stm escription: phrasing output tier type: string values, default: glob remarks: tier name stem of phrasing output. To the name stem the channel index will be added (also for mono ﬁles!). E.g.given a stereo ﬁle and fsys:augment:glob:tier out stm=’’IP’’ , the two segment tiers IP 1 and

IP 2 will be generated forchannel 1 and 2, respectively. fsys:augment:glob:tier parentdescription: parent tier for prosodic phrase extraction type: string or list of strings values, default: fsys:augment:chunk:tier out stmremarks:

Segment tiers deﬁning the superordinate domain for overall trend measurement from which the pre- and post-candidate-boundary segment deviate. This ﬁeld can contain a single string (a single tier for mono ﬁles or any fsys:augment:*:tier out stm value which will be expanded by the channel index). The user can also explicitly specify multiple tier names in a list, ifseveral channels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm . Forsegment tiers only. fsys:augment:glob:tierdescription:

The tier in which to look for the prosodic boundary candidates. type: fsys:augment:syl:tier out stm + ’ bnd’values, default: string or list of strings remarks:

This ﬁeld can contain a single string (a single tier for mono ﬁles or any fsys:augment:*:tier out stm value whichwill be expanded by the channel index and the syllable boundary inﬁx). The user can also explicitly specify multiple tier namesin a list, if several channels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm .Tiers can be of segment or event type. Default is the the bnd -output of fsys:augment:syl:tier out stm . Note thattreating all syllable boundaries as phrase boundary candidates may result in prosodic boundaries within words. Thus a wordsegmentation tier is strongly recommended. fsys:augment:loc:tier accdescription:

Pitch accent extraction event tier type: string or list of strings values, default: [ ] remarks:

Pitch accent candidate time stamps, e.g. syllable nucleus midpoints. This ﬁeld can contain a single string (a singletier for mono ﬁles or fsys:augment:syl:tier out stm which will be expanded by the channel index). The user can alsoexplicitly specify multiple tier names in a list, if several channels are to be processed and the tier names cannot be derived from fsys:augment:syl:tier out stm . For event tiers only. Field can be empty, but at least one of fsys:augment:loc:tier ag and fsys:augment:loc:tier acc needs to be speciﬁed. If only fsys:augment:loc:tier ag : analysis within segment; if only fsys:augment:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp;if both: analysis within ag segment, time normalization so that 0 position is at acc time stamp within ag . fsys:augment:loc:tier agdescription: pitch accent extraction segment tier type: string or list of strings values, default: [ ] remarks: Tier with segments that are potential accent groups segment domain. This ﬁeld can contain a single string formono ﬁles or a list of strings for more channels. Tiers can be of segment type only. Field can be empty, but at least oneof fsys:augment:loc:tier ag and fsys:augment:loc:tier acc needs to be speciﬁed. If only fsys:augment:loc:tier ag :analysis within segment; if only fsys:augment:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp; if both: analysis within ag segment, time normalization so that 0 position is at acc time stampwithin ag . fsys:augment:loc:tier out stmdescription: accent output tier name stem type: string values, default: acc remarks: To the name stem the channel index will be added (also for mono ﬁles!). E.g. given a stereo ﬁle and fsys:augment:loc:tier out stm=’’ACC’’ , the two event tiers

ACC 1 and

ACC 2 will be generated for channel 1 and 2,respectively. fsys:augment:loc:tier parentdescription: name of parent tier for pitch accent candidates type: string or list of strings values, default: [ ] remarks:

This parent tier contains segments of a superordinate domain with respect to which the deviation of the accentcandidate segments or time stamps is calculated. This might be global segments or chunks. Fallback is ﬁle-level. Must besegment tiers. This ﬁeld can contain a single string (a single tier for mono ﬁles or any fsys:augment:*:tier out stm valuewhich will be expanded by the channel index). The user can also explicitly specify multiple tier names in a list, if severalchannels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm . Tiers can besegment tiers only. sys:augment:syl:tier out stmdescription: tier name stem of syllable nucleus and boundary output type: string values, default: syl remarks: To the name stem the channel index will be added (also for mono ﬁles!). Syllable boundary tiers are furthermarked by the inﬁx bnd . E.g. given a stereo ﬁle and fsys:augment:syl:tier out stm=’’SYL’’ , the four event tiers

SYL 1,SYL bnd 1 and

SYL 2, SYL bnd 2 will be generated for syllable nuclei and boundaries and for channel 1 and 2, respectively. fsys:augment:syl:tier parentdescription: parent tier for syllable nucleus extraction type: string or list of strings values, default: chunkremarks:

The parent tier deﬁnes the boundaries over which the reference window for relative energy calculation must notcross. Fallback is ﬁle level. This ﬁeld can contain a single string (a single tier for mono ﬁles or any fsys:augment:*:tier out stm value which will be expanded by the channel index). The user can also explicitly specify multiple tier names in a list, ifseveral channels are to be processed and the tier names cannot be derived from any fsys:augment:*:tier out stm . fsys:bnd:tierdescription: boundary tier names type: string or list of strings values, default: [ ] remarks: each channel can contain several tiers to be analyzed. Segment or event tiers. For segment tiers the boundarybetween adjacent segments is parameterized, and for point tiers, the boundary at time stamps. fsys:channel:myTiername+description: channel index for each relevant tier name in the annotation ﬁle type: int values, default: myChannelIdx remarks: For augmentation output tiers this conﬁguration branch is generated automatically. fsys:chunk:tierdescription: chunk tier names type: string or list of strings values, default: [ ] remarks: one item for each channel. In case of multiple channels and single string, this string (e.g. “chunk”) is expanded to“chunk 1”, “chunk 2” . . . for each available channel index. If chunk tiers speciﬁed, their segments’ boundaries are not crossedby analysis and normalization windows for most feature sets. For the bnd trend feature set pre- and post-boundary segmentsare limited by the start and endpoint of the superordinate chunk if styl:bnd:cross chunk set to 1. fsys:export:csvdescription: output csv tables type: boolean values, default: remarks: If 1, for each extracted feature set a csv ﬁle is outputted together with a code template ﬁle to read the table inR. The ﬁle names are concatenated by fsys:export:stm and the name of the feature set. fsys:export:dirdescription: output directory type: string values, default:remarks:

Directory in which all csv tables, the log ﬁle, and the pickle ﬁle are stored. fsys:export:f0 preprocdescription: output preprocessed f0 contours type: boolean values, default: remarks: If 1, preprocessed f0 values are outputted for each input f0 ﬁle. The output format is as speciﬁed in section 4.2.The output is stored in the subdirectory f0 preproc below the directory fsys:export:dir . fsys:export:f0 residualdescription: output residual f0 contours type: boolean values, default: remarks: If 1, residual f0 contours after register removal are outputted for each input f0 ﬁle. The output format is asspeciﬁed in section 4.2. The output is stored in the subdirectory f0 residual below the directory fsys:export:dir . fsys:export:f0 resyn escription: output resynthesized f0 contours type: boolean values, default: remarks: If 1, the resynthesized f0 contours as a superposition of global and local contour shapes are outputted for eachinput f0 ﬁle. The output format is as speciﬁed in section 4.2. The output is stored in the subdirectory f0 resyn below thedirectory fsys:export:dir . fsys:export:fullpathdescription: whether or not to write the full path to the csv tables into the R code template ﬁles type: boolean values, default: remarks: If 1, the full path to the csv tables is written into the R code. 0 is recommended in case the data is shared andfurther processed at diﬀerent locations. fsys:export:sepdescription: table column separator type: string values, default: ,remarks: column separator for csv output tables. fsys:export:stmdescription: output ﬁle name stem type: string values, default: copasulremarks:

Same ﬁle name stem for all csv ﬁles, the log ﬁle, and the pickle ﬁle. fsys:export:summarydescription: output ﬁle/channel summary statistics type: boolean values, default: remarks: If 1, mean and variance values are calculated for all continuous-valued features outputted in the feature-set relatedcsv ﬁles per ﬁle and analysis tier. For categorical features unigram entropies are calculated. A fsys:export:stm .summary.csvﬁle is outputted together with an R code template ﬁle to read the table in R. fsys:f0:dirdescription: f0 ﬁle directory type: string values, default:remarks:

F0 ﬁle extension type: string values, default:remarks: only ﬁles with this extension are collected from the directory fsys:f0:typdescription:type: string values, default: tab remarks:

Currently only tab supported. fsys:glob:tierdescription: global segment tier names type: string or list of strings values, default: [ ] remarks: analysis tiers for global segment, only one per each channel supported, so that global and local segments can beassigned to each other. If taken over from fsys:augment:*:tier out stm , the names must be extended by the correspondingchannel index, e.g. IP 1 etc, see fsys:augment:*:tier out stm . Segment or event tier. Events are considered to be rightboundaries of segments and are expanded accordingly to segments. fsys:gnl en:tierdescription:

Tiers for standard energy variable extraction. type: string or list of strings values, default: [ ] emarks: More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . fsys:gnl f0:tierdescription: Tiers for standard f0 variable extraction type: string or list of strings values, default: [ ] remarks:

More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . fsys:grp:labdescription: grouping labels with values derived from ﬁle names type: list of strings values, default: [ ] remarks: Labels of ﬁle-name derived grouping. Non-relevant ﬁle parts are indicated by empty strings ”. E.g. given the f0ﬁlename stem a b 2. Let’s say, “a” represents the speaker ID, “b” is not relevant for the current analysis, and “2” representsthe stimulus ID. Then set fsys:grp:src=f0 , fsys:grp:sep= , and fsys:grp:lab= [ ’spk’,’’,’stim’ ]. The output csv tablesthen contain two additional grouping columns grp spk and grp stim with values derived from the ﬁle names (in this case “a”and “2”). Note that all grouping values are treated as strings. fsys:grp:sepdescription: ﬁle name split pattern type: string values, default:remarks: How to split the ﬁle name to access the grouping values. The string is interpreted as a regular expression. Thuspredeﬁned characters as the dot need to be protected! Thus if ﬁle parts are separated by a dot set this option to “ \\ .”. Ifﬁleparts are separated by more than one symbol, e.g. dot and underscore, use “( |\ .)”. fsys:grp:srcdescription: grouping source type: string values, default: f0 , annot, aud remarks: from which ﬁle type to derive the ﬁle name based grouping fsys:label:chunkdescription: chunk label type: string values, default: x remarks: will be used by automatic chunking fsys:label:paudescription: pause label type: string values, default: < P > remarks: in annotation ﬁles, segments labeled by this symbol are treated as pauses and are not analyzed. For boundaryfeature extraction these segments deﬁne the pause length feature between the preceding and following segment. Note, thatthis symbol as a pause identiﬁer must be uniform over all analyzed tiers. In Praat TextGrids also not labeled segments areconsidered as pauses. fsys:label:syldescription: syllable label type: string values, default: x remarks: will be used by automatic syllable extraction fsys:loc:tier accdescription: local event tier names type: string or list of strings values, default: [ ] remarks: tier (one for each channel) deﬁning pitch accent time stamps. Event tiers only. Field can be empty, but at leastone of fsys:loc:tier ag and fsys:loc:tier acc needs to be speciﬁed. If only fsys:loc:tier ag : analysis within segment;if only fsys:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp; ifboth: analysis within ag segment, time normalization so that 0 position is at acc time stamp within ag . fsys:loc:tier agdescription: local segment tier names type: string or list of strings values, default: [ ] emarks: tier (one for each channel) deﬁning accent group-like units. Segment tiers only. Field can be empty, but at leastone of fsys:loc:tier ag and fsys:loc:tier acc needs to be speciﬁed. If only fsys:loc:tier ag : analysis within segment;if only fsys:loc:tier acc : analysis within symmetric window of length preproc:point win centered on the time stamp; ifboth: analysis within ag segment, time normalization so that 0 position is at acc time stamp within ag . fsys:pho:tierdescription: name of tier with phonetic segments type: string or list of strings values, default: [] remarks: one tier per channel. Used for feature extraction in prosodic boundary and accent localization. fsys:pho:vowdescription: vowel pattern type: string values, default: [AEIOUYaeiouy29 { ] remarks: to identify vowel segments in fsys:pho:tier . Is interpreted as a regular expression. fsys:pic:dirdescription: directory for plotting output type: string values, default:remarks: directory for the png ﬁles generated by plotting. fsys:pic:stmdescription: ﬁle name stem of the plot ﬁles type: string values, default: copasulremarks:fsys:pulse:dirdescription: Pulse ﬁle directory type: string values, default:remarks:

Can be nested. Only for extracting voice quality features pulse ﬁles are obligatory. All obligatory directories mustcontain the same number of ﬁles in the same order. Optimally, same order is guaranteed using the same ﬁle name stem forcorresponding audio, f0, pulse, and annotation ﬁles. However, this is not required. fsys:pulse:extdescription:

Pulse ﬁle extension type: string values, default:remarks: only ﬁles with this extension are collected from the directory fsys:pulse:typdescription:type: string values, default: tab remarks:

Currently only tab supported. fsys:rhy en:tierdescription:

Tiers for energy rhythm extraction type: string or list of strings values, default: [ ] remarks:

More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . fsys:rhy en:tier ratedescription: Tiers containing units whose rate is to be calculated within each segment of the fsys:rhy f0:tier tiers type: string or list of strings values, default: [ ] remarks:

More than one tier per channel supported. Segment or event tiers. fsys:rhy f0:tierdescription:

Tiers for f0 rhythm extraction type: string or list of strings values, default: [ ] remarks:

More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . sys:rhy f0:tier ratedescription: Tiers containing units whose rate is to be calculated within each segment of the fsys:rhy f0:tier tiers type: string or list of strings values, default: [ ] remarks:

More than one tier per channel supported. Segment or event tiers. fsys:voice:tierdescription:

Tiers for voice quality extraction type: string or list of strings values, default: [ ] remarks:

More than one tier per channel supported. Segment or event tiers. Events are expanded to segments by preproc:point win . preproc:base prctdescription: Percentile below which base value for semitone transform is calculated type: ﬂoat ]0 100[ values, default: remarks: Base value for semitone transform is deﬁned as median of the values below the speciﬁed percentile. If set to 0,the base value will be set to 1, i.e. the semitone transform is carried out without normalization. preproc:base prct grp:myChannelIndexdescription:

Grouping variable for which for each of its levels a base value for f0 semitone transform is calculated type: string values, default: ’ ’ remarks:

Indicates for each channel index, which grouping variable is relevant. The grouping variable must be extractablefrom the ﬁle name as speciﬁed in fsys:grp . E.g. preproc:base prct grp:1 =spkId requires a spkId element in the list of fsys:grp:lab . Channel indices must be written in quotation marks as strings. preproc:loc aligndescription:

Robust treatment of local segments to which more than one center is assigned in the annotation. type: string values, default: skip , left, right remarks: skip – such local segments are skipped; left – the ﬁrst center is kept; right – the last center is kept. preproc:loc syncdescription:

Extract gnl * and rhy * features only at locations where loc features can be obtained. type: boolean values, default: remarks: Due to the strict hierarchy principle and to window length constraints it is not always possible to extract locfeatures at any location where gnl and rhy features can be obtained. If the user is interested only in locations where all thesefeature sets are available, so that the corresponding feature matrices can be concatenated, this option should be set to 1. preproc:nrm windescription: normalization window length (in sec) type: ﬂoat values, default: remarks: length of the normalization window. For feature sets gnl * all mean, max, std values derived in the analysiswindow are normalized within longer time window which length is deﬁned by this parameter. If segments to be analyzed arelonger than the normalization window, this window is set equal to the analyzed segment. nrm win can also be individually setfor each of the feature sets loc, gnl f0, gnl en, rhy f0, rhy en (see section 10) by specifying preproc:myFeatureSet:nrm win . preproc:out:fdescription: outlier deﬁnition factor type: ﬂoat values, default: remarks: identiﬁes non-zero f0 values as outliers, that deviate more than this factor times dispersion from the meanvalue. If preproc:out:m=mean , the mean value is given by the arithmetic mean and the dispersion by the standard devia-tion. If preproc:out:m=median , the mean value is given by the median and the dispersion by the inter quartile range. If preproc:out:m=fence , instead of the mean value the ﬁrst and third quartiles are used as references and dispersion is givenby the interquartile range ( Tukey’s fences ). preproc:out:mdescription: reference value deﬁnition for outlier identiﬁcation type: string alues, default: mean , median, fence remarks: Speciﬁes deﬁnition of mean/fence and dispersion, see preproc:out:f for details. preproc:point windescription: window length to transform events to segments (in sec) type: ﬂoat values, default: remarks:

The extraction of the feature sets glb *, rhy *, glob, loc is based on segments. For event tier input, segments areobtained by centering a window of this length on the time stamps. point win can also be individually set for each of thefeature sets loc, gnl f0, gnl en, rhy f0, rhy en (see section 10) by specifying preproc:myFeatureSet:point win . preproc:smooth:mtddescription: F0 smoothing method type: string values, default: sgolay , med remarks:

Savitzky-Golay or median ﬁltering of f0 contour. Median yields stronger smoothing, Savitzky-Golay performsbetter in keeping local minima and maxima at their place. preproc:smooth:orddescription: polynomial order of smoothing method type: integer values, default: remarks: relevant for preproc:smooth:mtd=sgolay only. preproc:smooth:windescription: smoothing window length (in f0 sample indices) type: int values, default: remarks: The longer the smoothing window, the more smooth the f0 contours. preproc:stdescription:

Hertz to semitone conversion type: boolean values, default: If 1, transformed to semitones. augment:chunk:e reldescription: proportion of reference energy below which a pause is assumed type: ﬂoat values, default: remarks: a pause is indicated, if the energy in the analysis window is below this factor times the energy in the longerreference window. augment:chunk:fbnddescription: assume pause at beginning and end of ﬁle type: boolean values, default: remarks: If set to 1, forced pause detection at ﬁle start and end. These pauses are subtracted from augment:chunk:n if set. augment:chunk:ﬂt:btypedescription: ﬁlter type type: string values, default: low , high, band remarks:

Butterworth ﬁlter type to ﬁlter the signal for pause detection. Recommended: low . augment:chunk:ﬂt:fdescription: ﬁlter cutoﬀ frequencies (in Hz) type: ﬂoat or list of ﬂoats values, default: 8000remarks: For augment:chunk:flt:btype=low, high a single cut-oﬀ frequency is expected; for band a 2-element list of lowerand upper cutoﬀ frequency. augment:chunk:ﬂt:orddescription: ﬁlter order type: int alues, default: remarks: Butterworth ﬁlter order. augment:chunk:l refdescription: reference window length for pause detection (in sec) type: int values, default: remarks: Energy in analysis window of length augment:chunk:l is compared against the energy within the reference window.Same midpoint as analysis window. augment:chunk:ldescription: length of the analysis window (in sec) type: ﬂoat values, default: remarks: analysis window for which is to be decided, whether or not it is (part of) a pause. augment:chunk:margindescription: silence margin at chunk start and end (in sec) type: ﬂoat values, default: remarks: chunks are extended by this amount on both sides. augment:chunk:min chunk ldescription: minimum chunk length (in sec) type: boolean values, default: remarks: shorter chunks are merged augment:chunk:min pau ldescription: minimum pause length (in sec) type: boolean values, default: remarks: shorter pauses are ignored. augment:chunk:ndescription: pre-speciﬁed number of pauses [sic!] type: boolean values, default: -1 remarks: In this implementation chunks are deﬁned as interpausal units and thus depend on pause detection. If set to -1,no pre-speciﬁed pause number. augment:syl:d mindescription: minimum distance between subsequent syllable nuclei (in sec) type: ﬂoat values, default: remarks:

If 2 detected nuclei are closer than this distance the weaker candidate is discarded. augment:syl:e mindescription: minimum energy factor relative to entire ﬁle type: boolean values, default: remarks:

For a syllable nucleus the RMS energy in the analysis window must be above this factor times the energy in theentire ﬁle. augment:syl:e reldescription: minimum energy factor relative to reference window type: boolean values, default: remarks:

For a syllable nucleus the RMS energy in the analysis window must be above this factor times the energy in thereference window. augment:syl:ﬂt:btypedescription: ﬁlter type type: string values, default: low, high, bandremarks:

Butterworth ﬁlter type to ﬁlter the signal for syllable nucleus detection. Recommended: band . ugment:syl:ﬂt:fdescription: ﬁlter cutoﬀ frequencies (in Hz) type: ﬂoat or list of ﬂoats values, default: [ 200 4000] remarks: For augment:syl:flt:btype=low, high a single cut-oﬀ frequency is expected; for band a 2-element list of lowerand upper cutoﬀ frequency. augment:syl:ﬂt:orddescription: ﬁlter order type: int values, default: remarks: Butterworth ﬁlter order. augment:syl:l refdescription: reference window length for syllable detection (in sec) type: boolean values, default: remarks:

Energy in analysis window with same midpoint is compared against the energy within the reference window. augment:syl:ldescription: analysis window length (in sec) type: boolean values, default: remarks: length of window within energy is calculated. Same midpoint as reference window. augment:glob:cntr mtddescription: how to deﬁne cluster centroids type: string values, default: seed prct , seed kmeans, split remarks: seed * : initialize clustering by bootstrapped seed centroids. seed prct: single-pass clustering of the boundarycandidates by their distance to these centroids. Distance values to the no-boundary seed above a speciﬁed percentile augment:glob:prct indicate boundaries. seed kmeans: kmeans clustering initialized by the seed centroids (gives a morebalanced amount of boundary/no boundary cases than seed prct ). split : centroids are derived by splitting each column in thefeature matrix at the percentile augment:glob:prct ; the boundary centroid is deﬁned by the median of the values above thesplitpoint, the no-boundary centroid by the median of the values below; items are then assigned to the nearest centroid in asingle pass. Depending on augment:glob:unit clustering is either carried out separately within each ﬁle and each channel,or over the entire dataset. Fallback: if cluster centroids cannot be bootstrapped, this parameter’s value is changed to split . augment:glob:heuristicsdescription: heuristic macro settings type: string values, default: ORT remarks:

Only

ORT supported.

ORT assumes a word segmentation tier for prosodic boundary prediction and rejectsboundaries after too short and thus probably function words ( < . s ). Not necessarily meaningful for any language. augment:glob:measuredescription: feature values, or deltas type: string values, default: abs , delta, abs+delta remarks: Which values v to put in the feature matrix ( i =time index): abs : feature values v [ i ]; delta : feature deltas v [ i ] − v [ i − abs+delta : both augment:glob:min ldescription: minimum inter-boundary distance (in sec) type: ﬂoat values, default: remarks: If 2 detected boundaries are closer than this value, only the stronger one will be kept. This distance is also usedin bootstrapping boundary and no-boundary centroids as described in section 7.3. augment:glob:prctdescription: percentile of cluster splitpoint type: ﬂoat ]0 100[ values, default: emarks: Splitpoint deﬁnition for clustering in terms of a percentile value. The higher the fewer boundaries will be detected.For augment:glob:cntr mtd=split the percentile refers to the feature values, for augment:glob:cntr mtd=seed prct , it refersto the distance to the no-boundary seed centroid. augment:glob:unitdescription: derive centroids separately for each ﬁle or over entire data set type: string values, default: batch , ﬁle remarks: batch mode recommended for corpora containing lots of short recordings, within which centroids cannot reliablybe extracted. augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+description: user deﬁned feature weights type: ﬂoat values, default: remarks: create one conﬁg branch for each selected boundary feature and assign a weight. Only boundary features supported.The weight becomes a dummy in case of augment:glob:wgt mtd is not user . However, the branches must be speciﬁedin order to mark which features to be used for boundary prediction. myBndFeatset ∈ { std, win, trend } , myRegister ∈{ bl, ml, tl, rng } , myFeat ∈ { r, rms, rms pre, . . . } . The branches must correspond to branches in the sub-dictionary copa:data:myFileIdx:myChannelIdx:bnd:myTierNameIndex:myBoundaryIndex (see section 12.3).E.g. copa:data:myFileIdx:myChannelIdx:bnd:myTierNameIndex:myBoundaryIndex:win:bl:r is addressed by augment:glob:wgt:win:bl:r . augment:glob:wgt:phodescription: use/weight normalized vowel length as feature type: ﬂoat values, default: remarks: only compliant with augment:glob:unit=batch . augment:glob:wgt mtddescription: feature weighting method type: string values, default: silhouette , correlation, user remarks: For silhouette an initial clustering is carried out, and for each feature its weight is then deﬁned by its cluster-separating power. For correlation weights are deﬁned for each feature by its correlation to the feature vector medians. For user , the weights speciﬁed in the augment:glob:wgt:myBndFeatset+:myRegister+:myFeat+ branches are taken. augment:loc:acc selectdescription: which syllable within a segment to select type: string values, default: max , left, right remarks:

Choose the accent position among all time stamps in augment:loc:tier acc that are in the same segment of fsys:augment:loc:tier ag . max: the most prominent one; left, right : accent ﬁrst/last syllable, which might be useful if fsys:augment:loc:tier ag contains word segments, and word stress is ﬁxed. augment:loc:ag selectdescription: which segments to select for accentuation type: string values, default: max , all remarks: all: assign an accent to each segment in fsys:augment:loc:tier ag ; max: assign accents to the most prominentsegments only. augment:loc:cntr mtddescription: how to deﬁne cluster centroids type: string values, default: seed prct , seed kmeans, split remarks: seed * : initialize clustering by bootstrapped seed centroids. seed prct: single-pass clustering of the accent candi-dates by their distance to these centroids. Distance values to the no-accent seed above a speciﬁed percentile augment:loc:prct indicate accents. seed kmeans: kmeans clustering initialized by the seed centroids (gives a more balanced amount of accent/no-accent cases than seed prct ). split : centroids are derived by splitting each column in the feature matrix at the percentile augment:glob:prct ; the accent centroid is deﬁned by the median of the values above the splitpoint, the no-accent cen-troid by the median of the values below; items are then assigned to the nearest centroid in a single pass. Depending on augment:loc:unit clustering is carried out either separately within each ﬁle and each channel, or over the entire dataset.Fallback: if cluster centroids cannot be bootstrapped, this parameter’s value is changed to split . augment:loc:heuristics escription: heuristic macro settings type: string values, default: ORT remarks: only

ORT supported.

ORT assumes a word segmentation tier for accent extraction. Short words (see augment:loc:max l na )will be treated as non-accent seeds, long words (see augment:loc:min l a ) as accent seeds. augment:loc:max l nadescription: maximum length of deﬁnitely non-accented words (in sec) type: ﬂoat values, default: remarks: from words below that length the non-accented seed centroid is derived augment:loc:measuredescription: feature values, or deltas type: string values, default: abs , delta, abs+delta remarks:

Which values v to put in the feature matrix ( i =time index): abs : feature values v [ i ]; delta : feature deltas v [ i ] − v [ i − abs+delta : both augment:loc:min l adescription: minimum length of deﬁnitely accented words (in sec) type: ﬂoat values, default: remarks: from words above that length the accented seed centroid is derived augment:loc:min ldescription: minimum inter-accent distance (in sec) type: ﬂoat values, default: remarks: If 2 detected accents are closer than this value, only the more prominent one will be kept. augment:loc:prctdescription: percentile of cluster splitpoint type: ﬂoat ]0 100[ values, default: remarks: Splitpoint deﬁnition for clustering in terms of a percentile value. The higher the fewer accents will be detected.For augment:loc:cntr mtd=split the percentile refers to the feature values, for augment:loc:cntr mtd=seed prct , it refersto the distance to the no-accent seed centroid. augment:loc:unitdescription: derive centroids separately for each ﬁle or over entire data set type: string values, default: batch , ﬁle remarks: batch mode recommended for corpora containing lots of short recordings, within which centroids cannot reliablybe extracted. augment:loc:wgt:myFeatset+:. . .description: user deﬁned feature weights type: ﬂoat values, default: remarks: create one conﬁg branch for each selected prominence feature and assign a weight. myFeatset ∈ { acc, gst, gnl f0,gnl en } . The weight becomes a dummy in case of augment:loc:wgt mtd is not user . However, the branches must be speciﬁed inorder to mark which features to be used for accent prediction. The branches must correspond to branches in the sub-dictionary copa:data:myFileIdx:myChannelIdx:loc (see section 12.3). E.g. copa:data:myFileIdx:myChannelIdx:loc:gst:bl:rms is addressed by augment:loc:wgt:gst:bl:rms . If the value at this branch is a list (e.g. the polynomial coeﬃcients in ...augment:loc:wgt:acc:c ) the weight can either be a scalar to weight all list elements equally or a list of same length asthe value list, to individually weight each element. (Only) for polynomial coeﬃcients absolute values are taken. augment:loc:wgt:phodescription: use/weight normalized vowel length as feature type: ﬂoat values, default: remarks: only compliant with augment:loc:unit=batch . augment:loc:wgt mtddescription: feature weighting method type: string values, default: silhouette , correlation, user remarks: For silhouette an initial clustering is carried out, and for each feature its weight is then deﬁned by its cluster-separating power. For correlation weights are deﬁned for each feature by its correlation to the feature vector medians. For user , the weights speciﬁed in the augment:loc:wgt:...+ branches are taken. styl:glob:decl windescription: window length for median calculation (in sec) type: ﬂoat values, default: remarks: Within each window a median each for the base-, mid-, and topline is derived. styl:glob:nrm:mtddescription: time normalization method type: string values, default: minmaxremarks: for time normalization in global segment. Currently only minmax supported. styl:glob:nrm:rngdescription: normalized time range type: list of ﬂoats values, default: [0 , remarks: normalized time of segment start and endpoint styl:glob:prct:bldescription: percentile below which the baseline input medians are calculated type: ﬂoat ]0 100[ values, default: remarks: A sequence of lower range medians is calculated along the f0 contour. The baseline is given by linear regressionthrough this sequence. styl:glob:prct:tldescription: percentile, above which the topline input medians are calculated type: ﬂoat ]0 100[ values, default: remarks: A sequence of upper range medians is calculated along the f0 contour. The topline is given by linear regressionthrough this sequence. styl:loc:nrm:mtddescription: time normalization method type: string values, default: minmaxremarks: for time normalization in the local segment. Currently only minmax supported. styl:loc:nrm:rngdescription: normalized time range type: list of ﬂoats values, default: [ − , remarks: normalized time of segment start and endpoint. [ − ,

1] is recommended to center the polynomial around 0. styl:loc:orddescription: polynomial order type: int values, default: remarks: Each coeﬃcient will get its output column in the exported tables, thus the table size depends on this order. styl:registerdescription: register deﬁnition for residual calculation type: string values, default: ml , bl, tl, rng, none remarks: how to remove the global component from the f0 contour to get the residual the local contour is calculated on; bl,ml, tl : base-, mid- or topline subtraction; rng pointwise [0 1] normalization of the f0 contour with respect to the base- andtopline. Recommended: ml, rng . rng normalizes for range declination (lower f0 amplitudes at the end of prosodic phrases). styl:bnd:cross chunkdescription: stylization windows across chunks type: boolean values, default: remarks: if set to 1, the windows deﬁned by styl:bnd:win can cross chunks, else they are limited by the current chunk’sboundaries. If set to 1 for do bnd trend , lines are ﬁtted from ﬁle start and till ﬁle end. Else, they are limited by the currentchunk’s boundaries. styl:bnd:decl windescription: window length for median calculation (in sec) type: ﬂoat values, default: remarks: Within each window a median each for the base-, mid- and topline is derived. styl:bnd:nrm:mtddescription: time normalization method type: string values, default: minmaxremarks:

Only minmax supported. styl:bnd:nrm:rngdescription: normalized time range type: list of ﬂoats values, default: [0 , remarks: to allow for comparisons independent of segment length, time is normalized to this range. styl:bnd:prct:bldescription: percentile below which the baseline input medians are calculated type: ﬂoat ]0 100[ values, default: remarks: A sequence of lower range medians is calculated along the f0 contour. The baseline is given by linear regressionthrough this sequence. styl:bnd:prct:tldescription: percentile, above which the topline input medians are calculated type: ﬂoat ]0 100[ values, default: remarks: A sequence of upper range medians is calculated along the f0 contour. The topline is given by linear regressionthrough this sequence. styl:bnd:residualdescription: use f0 residual type: boolean values, default: remarks: measure discontinuity on (preprocessed) f0 contour or on its residual after register subtraction. styl:bnd:windescription: window length (in sec) type: ﬂoat values, default: remarks: stylization window length for navigate:do styl bnd win styl:gnl:sb:alphadescription: pre-emphasis factor or lower boundary frequency type: ﬂoat values, default: remarks: For pre-emphasis in the time domain for spectral balance calculation. 0 ≤ α ≤

1: factor in s (cid:48) [ i ] = s [ i ] − α · s [ i − α >

1: lower boundary frequency from which pre-emphasis should start. Will be internally converted to the factor in theabove formula. styl:gnl:sb:btypedescription: ﬁlter type to restrict frequency window type: string values, default: none , band, high, low emarks: Restrict frequency window for spectral balance calculation styl:gnl:sb:domaindescription: domain for spectral balance calculation type: string values, default: time , freq remarks:

Speciﬁes whether spectral balance should be calculated in the time (’time’) or frequency (’freq’) domain. styl:gnl:sb:fdescription: ﬁlter cutoﬀ frequencies (in Hz) for spectral balance calculation type: ﬂoat or list of ﬂoats values, default: -1 remarks: Speciﬁes the upper cutoﬀ frequency for a low-pass ﬁlter, the lower cutoﬀ frequency for a high-pass ﬁlter, or bothfor a bandpass ﬁlter. See styl:gnl:sb:btype . styl:gnl:sb:windescription: length (in sec) of central analysis window in analysed segment type: ﬂoat values, default: -1 remarks: To be set if coarticulatory inﬂuence on spectral balance calculation should be removed. If -1 the entire segmentis used. styl:gnl:windescription: window length to determine initial and ﬁnal part of contour type: ﬂoat values, default: remarks:

Length of window (in sec) for initial and ﬁnal part of f0 or energy contour to calculate mean f0 pr energy quotientsof these parts and the entire contour. styl:gnl en:alphadescription: pre-emphasis factor type: ﬂoat values, default: remarks:

Pre-emphasis is carried out in the time domain as follows: s (cid:48) [ i ] = s [ i ] − α · s [ i − DEPRECATED! NOWSPECIFIED BY styl:gnl en:sb:alphastyl:gnl en:stsdescription: step size (in sec) type: ﬂoat values, default: remarks:

Stepsize by which energy window is shifted. styl:gnl en:winparamdescription: window parameter type: string or int values, default: – remarks: Depends on styl:gnl en:wintyp ; as required by scipy.signal.get window() . styl:gnl en:wintypdescription: window type type: string values, default: hamming , kaiser, . . . remarks: All window types that are supported by scipy.signal.get window() can be used. styl:gnl en:windescription: window length (in sec) type: ﬂoat values, default: remarks:

Energy is calculated in terms of RMSD within windows of this length. styl:rhy f0:rhy:lbdescription: Lower frequency boundary of DCT coeﬃcients (in Hz) type: boolean values, default: remarks: Can be raised if low-frequency events should be ignored. styl:rhy f0:rhy:nsmdescription: number of spectral moments type: int values, default: remarks: How many spectral moments to calculate from DCT analysis of f0 contour. styl:rhy f0:rhy:rmodescription: remove DCT oﬀset type: boolean values, default: remarks: Remove ﬁrst DCT coeﬃcient. styl:rhy f0:rhy:ubdescription: upper frequency boundary of DCT coeﬃcients (in Hz) type: ﬂoat values, default: remarks: Upper boundary of analyzed DCT spectrum (higher-frequency events assumed not to be inﬂuential for prosody). styl:rhy f0:rhy:wgt:rbdescription: rate band (in Hz) type: ﬂoat values, default: remarks: Frequency band around event frequency, within which the inﬂuence of the event in terms of absolute DCTcoeﬃcient values is integrated. E.g. for an event rate of 4 Hz and a rate band of 1 Hz the absolute values of the DCTcoeﬃcients between 3 and 5 Hz are summed up. styl:rhy f0:rhy:winparamdescription: window parameter type: string or int values, default: remarks: depends on styl:gnl en:wintyp ; as required by scipy.signal.get window() . styl:rhy f0:rhy:wintypdescription: window type for DCT analysis type: string values, default: hamming, kaiser , . . . remarks: All window types that are supported by scipy.signal.get window() can be used. styl:rhy en:rhy:lbdescription: lower frequency boundary of DCT coeﬃcients (in Hz) type: ﬂoat values, default: remarks: Can be raised if low-frequency events should be ignored. styl:rhy en:rhy:nsmdescription: number of spectral moments type: int values, default: remarks: How many spectral moments to be calculated from DCT analysis of energy contour. styl:rhy en:rhy:rmodescription: remove DCT oﬀset type: boolean values, default: remarks: Remove ﬁrst DCT coeﬃcient. styl:rhy en:rhy:ub escription: upper frequency boundary of DCT coeﬃcients (in Hz) type: ﬂoat values, default: remarks: Upper boundary of analyzed DCT spectrum (higher-frequency events assumed not to be inﬂuential for prosody). styl:rhy en:rhy:wgt:rbdescription: rate band (in Hz) type: ﬂoat values, default: remarks: Frequency band around event frequency, within which the inﬂuence of the event in terms of absolute DCTcoeﬃcient values is integrated. E.g. for an event rate of 4 Hz and a rate band of 1 Hz the absolute values of the DCTcoeﬃcients between 3 and 5 Hz are summed up. styl:rhy en:rhy:winparamdescription:

DCT window parameter type: string or int values, default: remarks: Depends on styl:rhy en:wintyp ; as required by scipy.signal.get window() . styl:rhy en:rhy:wintypdescription: window type for DCT type: string values, default: hamming, kaiser , . . . remarks: All window types that are supported by scipy.signal.get window() . styl:rhy en:sig:scaledescription: scale signal to maximum amplitude 1 type: boolean values, default: remarks: if set to 1, the signal is scaled to its maximum amplitude. This is suggested especially if signals of diﬀerentrecording conditions are to be compared. styl:rhy en:sig:stsdescription: step size (in sec) type: ﬂoat values, default: remarks: Step size by which the energy window is shifted. styl:rhy en:sig:winparamdescription: window parameter type: string or int values, default: – remarks: Depends on styl:rhy en:wintyp ; as required by scipy.signal.get window() . styl:rhy en:sig:wintypdescription: window type of energy calculation type: string values, default: hamming , kaiser, . . . remarks: all window types that are supported by scipy.signal.get window() . styl:rhy en:sig:windescription: window length (in sec) type: ﬂoat values, default: remarks: Energy is calculated in terms of RMSD within windows of this length. styl:voice:jit:fac maxdescription: maximally allowed quotient of adjacent periods type: ﬂoat values, default: remarks: corresponds to Praat parameter Maximum period factor. styl:voice:jit:t maxdescription: maximum period length in sec type: ﬂoat alues, default: remarks: corresponds to Praat parameter Period ceiling. styl:voice:jit:t mindescription: minimum period length in sec type: ﬂoat values, default: remarks: corresponds to Praat parameter Period ﬂoor. clst:glob:estimate bandwidth:n samplesdescription: number of samples to estimate bandwidth type: integer values, default: remarks: Computationally expensive, high numbers will require long processing time. clst:glob:estimate bandwidth:quantiledescription: estimate bandwidth quantile parameter type: ﬂoat values, default: remarks:

Lower values result in higher clusters numbers. clst:glob:kMeans:initdescription: initialization method of kmeans type: string values, default: meanShift remarks:

All methods that are supported by kMeans() can be used. For meanShift the number of clusters does not need tobe speciﬁed. clst:glob:kMeans:max iterdescription: kMeans: maximum number of iterations type: int values, default: remarks:

When to stop cluster re-adjustment, if not yet converged. clst:glob:kMeans:n clusterdescription: kMeans: predeﬁned number of contour classes type: int values, default: remarks: Irrelevant, if kmeans centroids are initialized by clst:glob:kMeans:init=meanShift . clst:glob:kMeans:n initdescription: number of initialization trials type: int values, default: remarks: kMeans is repeated with diﬀerent cluster initializations from which the best clustering result is kept. clst:glob:meanShift:bandwidthdescription: bandwidth parameter for meanShift cluster center initialization type: ﬂoat values, default: remarks: clst:glob:meanShift:bin seedingdescription: bin seeding type: boolean values, default: remarks: parameter for meanShift clustering clst:glob:meanShift:min bin freqdescription: minimum number of items in each bin type: int values, default: remarks: Parameter for meanShift clustering. clst:glob:mtddescription: clustering method type: string values, default: meanShift , kmeans remarks:

No initial cluster number speciﬁcation needed for meanShift . clst:loc:estimate bandwidth:n samplesdescription: number of samples to estimate bandwidth type: int values, default: remarks: Computationally expensive, high numbers will require long processing time. clst:loc:estimate bandwidth:quantiledescription: estimate bandwidth quantile parameter type: ﬂoat values, default: remarks:

Lower values result in higher clusters numbers. clst:loc:kMeans:initdescription: initialization method of kmeans type: string values, default: meanShift remarks:

All methods that are supported by kMeans() can be used. For meanShift the number of clusters does not need tobe speciﬁed. clst:loc:kMeans:max iterdescription: kMeans: maximum number of iterations type: int values, default: remarks:

When to stop cluster re-adjustment, if not yet converged. clst:loc:kMeans:n clusterdescription: kMeans: predeﬁned number of contour classes type: int values, default: remarks: Irrelevant, if kmeans centroids are initialized by clst:glob:kMeans:init=meanShift . clst:loc:kMeans:n initdescription: number of initialization trials type: int values, default: remarks: kMeans is repeated with diﬀerent cluster initializations from which the best clustering result is kept. clst:loc:meanShift:bandwidthdescription: bandwidth parameter for meanShift cluster center initialization type: boolean values, default: remarks:clst:loc:meanShift:bin seedingdescription: bin seeding type: boolean values, default: remarks: parameter for meanShift clustering clst:loc:meanShift:min bin freqdescription: minimum number of items in each bin type: int values, default: remarks: Parameter for meanShift clustering. clst:loc:mtddescription: clustering method type: string values, default: meanShift , kmeans remarks:

No initial cluster number speciﬁcation needed for meanShift . plot:browse:savedescription: save plots according to fsys:pic type: boolean values, default: remarks: Store png ﬁles in fsys:pic:dir with ﬁle name stem fsys:pic:stm . plot:browse:single plot:activedescription: switch on single plot mode type: boolean values, default: remarks: switch on single plot mode if only one segment speciﬁed by ﬁle index, channel index, and segment index is to beplotted plot:browse:single plot:channel idescription: channel index of selected segment type: integer values, default: remarks: channel index of selected segment to be plotted plot:browse:single plot:ﬁle idescription: ﬁle index of selected segment type: integer values, default: remarks: ﬁle index of selected segment to be plotted plot:browse:single plot:segment idescription: segment index of selected segment type: integer values, default: remarks: segment index of selected segment to be plotted plot:browse:timedescription: when to do plotting type: string values, default: online, ﬁnalremarks: online: plot at stylization stage for immediate check of appropriateness of conﬁgurations. ﬁnal: plot segment-wisefrom the ﬁnally stored results. Click on plot: next; press return : quit. plot:browse:type:clst:contoursdescription: plot global and local intonation class centroids type: boolean values, default: remarks:plot:browse:type:complex:gestaltdescription: plot local contour Gestalt stylization type: boolean values, default: remarks:plot:browse:type:complex:superposdescription: plot global and local contour superposition type: boolean values, default: remarks:plot:browse:type:glob:decldescription: plot global contour register stylization type: boolean values, default: remarks:plot:browse:type:loc:accdescription: plot local contour polynomial stylization type: boolean values, default: emarks:plot:browse:type:loc:decldescription: plot local contour register stylization type: boolean values, default: remarks:plot:browse:type:complex:bnddescription: plot boundary stylization type: boolean values, default: remarks:plot:browse:type:complex:bnd windescription: plot boundary stylization (ﬁxed window) type: boolean values, default: remarks:plot:browse:type:complex:bnd trenddescription: plot boundary stylization (trend) type: boolean values, default: remarks:plot:browse:type:rhy en:rhydescription: plot inﬂuence of rate tier events on DCT of energy contour in analysis tier type: boolean values, default: remarks:plot:browse:type:rhy f0:rhydescription: plot inﬂuence of rate tier events on DCT of f0 contour in analysis tier type: boolean values, default: remarks:plot:browse:verbosedescription: display ﬁle, channel and segment index for each plot type: boolean values, default: remarks: written to STDOUT plot:colordescription: plot in color (1) or black-white (0) type: boolean values, default: remarks: plot:grp:groupingdescription: list of selected grouping variables from fsys:grp:lab type: list of strings values, default: [ ] remarks: For each combination of grouping factor levels the stylization plot based on the respective parameter mean vectoris stored as a png ﬁle in fsys:pic:dir with ﬁle name stem fsys:pic:stm and an inﬁx expressing the respective factor levelcombination. plot:grp:savedescription: save plots according to fsys:pic type: boolean values, default: remarks: Store png ﬁles in fsys:pic:dir with ﬁle name stem fsys:pic:stm . One ﬁle per group. plot:grp:type:glob:decldescription: plot global contour declination centroid for each group type: boolean alues, default: remarks: Plots are not displayed but saved as png ﬁles to fsys:pic . plot:grp:type:loc:accdescription: plot local contour polynomial shape centroid for each group type: boolean values, default: remarks: Plots are not displayed but saved as png ﬁles to fsys:pic . plot:grp:type:loc:decldescription: plot local contour declination centroid for each group type: boolean values, default: remarks: Plots are not displayed but saved as png ﬁles to fsys:pic .

12 Output If fsys:export:csv is set to 1, for each feature set selected by the navigate:* options a csv table ﬁle is generatedin config:fsys:export:dir . The ﬁle name is the underscore-concatenation of config:fsys:export:stm and thefeature set name. Extension is csv . Columns are separated by a comma. The column titles correspond to the featurenames given in the tables in section 10, and each row corresponds to one segment or event for which the features wereextracted. These feature vectors are additionally linked to the data origin by the following columns: name description ci channel index (starting with 0)ﬁ ﬁle index (starting with 0)ii item (segment or event) index (starting with 0)stm annotation ﬁle name stemt on time onsett oﬀ time oﬀset (same as t on for events)tier tier nameInter-tier relations are provided by the following columns name description is init initial position in a global segmentis ﬁn ﬁnal position in a global segmentis init chunk initial position in a chunkis ﬁn chunk ﬁnal position in a chunkAll columns contain the values yes and no . Medial position is simply indicated by is init=no and is ﬁn=no . Thesecolumns can be used for data subsetting. As an example, let’s assume that boundary features were extracted betweenaccent groups, and the global segments correspond to intonation phrases. Then is ﬁn serves to hold apart IP-ﬁnal andnon-ﬁnal boundaries. Equivalently, phrase-ﬁnal and non-ﬁnal accents can be held apart. is init chunk and is ﬁn chunk work the same on the chunk level. If no chunk tier is speciﬁed, the entire channel is considered to be a single chunk.If no global segment tier is speciﬁed, all is init and is ﬁn are set to no .Finally, if speciﬁed by the user, an arbitrary number of grouping columns will be added to the tables that arederived from the ﬁlenames. Their names are preﬁxed by grp . See the grouping options fsys:grp:* in section 11.3for details. Each table ﬁle comes along with an R code template ﬁle with the same name and the extension .R to readthis table by the R software. By setting fsys:export:summary to 1 the table output described in section 12.1 can be summarized per ﬁle andanalysis tier. Summarization for continuous-valued features is done in terms of their mean, median, standard devia-tion, and inter-quartile range. For categorical features as intonation contour classes the unigram entropy is calculated.The resulting table is written to the directory fsys:export:dir with the ﬁle stem fsys:export:stm plus the suf-ﬁx summary and the extension csv . Columns are separated by a comma. There is one row of statistic values peranalysed tier in a ﬁle. Each continous-valued feature within each analysis tier is represented by four columns. Forfeatures of the sets glob and loc for which there is only one analysis tier the column names follow the pattern feature-Set featureName statisticMeasure . The suﬃxes representing the statistic measurements are listed in the table rightbelow. For features of all other sets with potentially more than one analysis tier the column names are built like this: featureSet analysisTierName featureName statisticMeasure . Categorical features are represented by one column eachwith the same name building schema. 53uﬃx meaning feature typem arit. mean continuousmed median continuoussd standard deviation continuousiqr inter-quartile range continuoush unigram entropy categoricalFile level groupings, i.e. the grp * columns of the csv tables decribed in section 12.1, are copied to the summarytable. File and channel index are given in the columns ﬁ and ci , respectively, the ﬁle stem is written to column stm .Next to the csv ﬁle an R code template ﬁle is generated with the same name and the extension .R to read thesummary table by the R software. The pickle ﬁle which is outputted in config:fsys:export:dir contains a nested dictionary copa for the sake of furtherprocessing within other Python projects.On the top level copa can be subdivided into the sub-dictionaries • config : conﬁgurations underlying the current analysis • data : extracted features in a structured way described below • clst : contour clustering results • val : validation metrics for stylization and clusteringIn the subsequent paragraphs all branches through the copasul nested output dictionary are described. Thefollowing index key conventions will be used:ﬁ ﬁle indexci channel indexti tier indexii item (segment or event) indexAll indices start with 0, thus channel 1 is represented by index 0, etc. Levels in the dictionary are separated bycolons. To give an example for the data sub-dictionary how to translate this notation into Python code: data:fi:ci:bnd:ti:ii:lab – with index values: data:0:0:bnd:0:0:lab refers to: ﬁle 1 : channel 1 : boundary feature set : ﬁrst tier, for which this set was extracted : segment 1 in thistier : label of this segment. In Python this label can be accessed by: copa[’data’][0][0][’bnd’][0][0][’lab’] Variables to be replaced by annotation-dependent tiernames etc. are marked by my* . As an example data:0:0:rhy f0:0:0:rate:myTierName* is expanded to one branch for each tier in fsys:rhy f0:tier rate referring to channel 1 in fsys:channel (seesection 11). Let fsys:rhy f0:tier rate =[”syl 1”, “syl 2”], of which only the former refers to channel 1, i.e. fsys:channel:syl 1 =1. Then the corresponding rate value of items in tier syl 1 within ﬁle 1, channel 1, segment 1,and analysis tier 1 ( fi=ci=ti=ii=0 ) is addressed in Python by: copa[’data’][0][0][’rhy f0’][0][0][’rate’][’syl 1’]

This sub-dictionary is accessed by copa[’config’] and simply contains a copy of the user-deﬁned and default con-ﬁgurations which are introduced in section 11.

Is accessed by copa[’data’] and can further be subdivided into dictionaries for ﬁle information, f0 preprocessingoutput, chunk segmentation, and feature sets. Time information is always given in seconds and can be accessed bythe keys t, tn, to, tt . to always contains the original time values derived from the annotations, while t, tn , and tt values are rounded to the second decimal place to be in sync with f0 values that are sampled at 100 Hz. Thesemantics of t, tn, to, tt depends on the respective sub-dictionary. In the following all copa[’data’] brancheswill be described in alphabetical order. If a feature variable at the end of a branch is listed in one of the tables insection 10, here only the feature name is given, which can be looked up in these tables.54 oundary features Boundary features can be extracted within an arbitrary number of tiers. These tiers areindexed by the variable ti . For segment tiers the index ii refers to the segment preceding the boundary. data:ﬁ:ci:bnd:ti:ii:decl:bl:cdescription: F0 baseline coeﬃcients (descending order) type: data:ﬁ:ci:bnd:ti:ii:decl:bl:xdescription: baseline stylization input type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:bl:ydescription: stylized baseline values type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:errdescription:

True if top- and baseline cross type: boolean data:ﬁ:ci:bnd:ti:ii:decl:ml:cdescription:

F0 midline coeﬃcients (descending order) type: data:ﬁ:ci:bnd:ti:ii:decl:ml:xdescription: midline stylization input type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:ml:ydescription: stylized midline values type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:rng:cdescription:

F0 range coeﬃcients (descending order) type: data:ﬁ:ci:bnd:ti:ii:decl:rng:xdescription: range stylization input type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:rng:ydescription: stylized range values type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:tl:cdescription:

F0 topline coeﬃcients (descending order) type: data:ﬁ:ci:bnd:ti:ii:decl:tl:xdescription: topline stylization input type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:tl:ydescription: stylized topline values type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:decl:tndescription: normalized time values (same length as bl | ml | rng | tl:y ) type: list of ﬂoats data:ﬁ:ci:bnd:ti:ii:labdescription: lab type: string data:ﬁ:ci:bnd:ti:ii:std:bl:aicIdescription: std bl aicI type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:std:bl:aicI postdescription: std bl aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:aicI predescription: std bl aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:corrDdescription: std bl corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:corrD postdescription: std bl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:corrD predescription: std bl corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:d mdescription: std bl d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:d odescription: std bl d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rdescription: std bl r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rmsdescription: std bl rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rms postdescription: std bl rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rms predescription: std bl rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rmsRdescription: std bl rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rmsR postdescription: std bl rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:rmsR predescription: std bl rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:sd postdescription: std bl sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:sd predescription: std bl sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:bl:sd prepost escription: std bl sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:aicIdescription: std ml aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:aicI postdescription: std ml aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:aicI predescription: std ml aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:corrDdescription: std ml corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:corrD postdescription: std ml corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:corrD predescription: std ml corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:d mdescription: std ml d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:d odescription: std ml d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rdescription: std ml r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rmsdescription: std ml rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rms postdescription: std ml rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rms predescription: std ml rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rmsRdescription: std ml rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rmsR postdescription: std ml rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:rmsR predescription: std ml rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:sd postdescription: std ml sd post type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:std:ml:sd predescription: std ml sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:ml:sd prepostdescription: std ml sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:pdescription: p type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:aicIdescription: std rng aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:aicI postdescription: std rng aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:aicI predescription: std rng aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:corrDdescription: std rng corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:corrD postdescription: std rng corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:corrD predescription: std rng corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:d mdescription: std rng d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:d odescription: std rng d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rdescription: std rng r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rmsdescription: std rng rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rms postdescription: std rng rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rms predescription: std rng rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rmsRdescription: std rng rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rmsR post escription: std rng rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:rmsR predescription: std rng rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:sd postdescription: std rng sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:sd predescription: std rng sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:rng:sd prepostdescription: std rng sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:aicIdescription: std tl aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:aicI postdescription: std tl aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:aicI predescription: std tl aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:corrDdescription: std bl corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:corrD postdescription: std tl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:corrD predescription: std tl corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:d mdescription: std tl d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:d odescription: std tl d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:rdescription: std tl r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:rmsdescription: std tl rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:rms postdescription: std tl rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:rms predescription: std tl rms pre type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:std:tl:rmsRdescription: std tl rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:rmsR postdescription: std tl rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:rmsR predescription: std tl rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:sd postdescription: std tl sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:sd predescription: std tl sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:std:tl:sd prepostdescription: std tl sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:tdescription: segment tier: time start and end of current segment; event tier: interval from the preceding to the currenttime stamp (for bnd:...:std features) type: data:ﬁ:ci:bnd:ti:ii:tierdescription: tier name type: string data:ﬁ:ci:bnd:ti:ii:tndescription: start and end of pre-boundary analysis window, start and end of post-boundary analysis window (for bnd:...:win features) type: data:ﬁ:ci:bnd:ti:ii:todescription: t non-rounded type: data:ﬁ:ci:bnd:ti:ii:trend:bl:aicIdescription: trend bl aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:aicI postdescription: trend bl aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:aicI predescription: trend bl aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:corrDdescription: trend bl corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:corrD postdescription: trend bl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:corrD predescription: trend bl corrD pre type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:trend:bl:d mdescription: trend bl d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:d odescription: trend bl d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rdescription: trend bl r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rmsdescription: trend bl rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rms postdescription: trend bl rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rms predescription: trend bl rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rmsRdescription: trend bl rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rmsR postdescription: trend bl rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:rmsR predescription: trend bl rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:sd postdescription: trend bl sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:sd predescription: trend bl sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:bl:sd prepostdescription: trend bl sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:aicIdescription: trend ml aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:aicI postdescription: trend ml aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:aicI predescription: trend ml aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:corrDdescription: trend ml corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:corrD post escription: trend bl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:corrD predescription: trend bl corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:d mdescription: trend ml d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:d odescription: trend ml d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rdescription: trend ml r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rmsdescription: trend ml rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rms postdescription: trend ml rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rms predescription: trend ml rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rmsRdescription: trend ml rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rmsR postdescription: trend ml rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:rmsR predescription: trend ml rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:sd postdescription: trend ml sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:sd predescription: trend ml sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:ml:sd prepostdescription: trend ml sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:pdescription: trend ml rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:aicIdescription: trend rng aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:aicI postdescription: trend rng aicI post type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:trend:rng:aicI predescription: trend rng aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:corrDdescription: trend rng corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:corrD postdescription: trend rng corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:corrD predescription: trend rng corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:d mdescription: trend rng d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:d odescription: trend rng d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rdescription: trend rng r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rmsdescription: trend rng rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rms postdescription: trend rng rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rms predescription: trend rng rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rmsRdescription: trend rng rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rmsR postdescription: trend rng rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:rmsR predescription: trend rng rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:sd postdescription: trend rng sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:sd predescription: trend rng sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:rng:sd prepostdescription: trend rng sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:aicI escription: trend bl aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:aicI postdescription: trend tl aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:aicI predescription: trend tl aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:corrDdescription: trend tl corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:corrD postdescription: trend tl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:corrD predescription: trend tl corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:d mdescription: trend tl d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:d odescription: trend tl d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rdescription: trend tl r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rmsdescription: trend tl rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rms postdescription: trend tl rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rms predescription: trend tl rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rmsRdescription: trend tl rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rmsR postdescription: trend tl rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:rmsR predescription: trend tl rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:sd postdescription: trend tl sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:trend:tl:sd predescription: trend tl sd pre type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:trend:tl:sd prepostdescription: trend tl sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:ttdescription: start and endpoint for 2 trend windows: from ﬁle or chunk start (depending on the styl:bnd:cross chunk valuein the conﬁgurations, see section 11) till the end of the pre-boundary segment; from the start of the post-boundary segmenttill ﬁle/chunk end (for bnd:...:trend features). type: data:ﬁ:ci:bnd:ti:ii:win:bl:aicIdescription: win bl aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:aicI postdescription: win bl aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:aicI predescription: win bl aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:corrDdescription: win bl corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:corrD postdescription: win bl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:corrD predescription: win bl corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:d mdescription: win bl d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:d odescription: win bl d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:rdescription: win bl r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:rmsdescription: win bl rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:rms postdescription: win bl rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:rms predescription: win bl rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:rmsRdescription: win bl rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:rmsR postdescription: win bl rmsR post type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:win:bl:rmsR predescription: win bl rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:sd postdescription: win bl sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:sd predescription: win bl sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:bl:sd prepostdescription: win bl sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:aicIdescription: win ml aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:aicI postdescription: win ml aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:aicI predescription: win ml aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:corrDdescription: win ml corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:corrD postdescription: win ml corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:corrD predescription: win ml corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:d mdescription: win ml d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:d odescription: win ml d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rdescription: win ml r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rmsdescription: win ml rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rms postdescription: win ml rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rms predescription: win ml rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rmsR escription: win ml rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rmsR postdescription: win ml rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:rmsR predescription: win ml rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:sd postdescription: win ml sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:sd predescription: win ml sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:ml:sd prepostdescription: win ml sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:pdescription: p type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:aicIdescription: win rng aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:aicI postdescription: win rng aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:aicI predescription: win rng aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:corrDdescription: win rng corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:corrD postdescription: win rng corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:corrD predescription: win rng corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:d mdescription: win rng d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:d odescription: win rng d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:rdescription: win rng r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:rmsdescription: win rng rms type: ﬂoat ata:ﬁ:ci:bnd:ti:ii:win:rng:rms postdescription: win rng rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:rms predescription: win rng rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:rmsRdescription: win rng rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:rmsR postdescription: win rng rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:rmsR predescription: win rng rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:sd postdescription: win rng sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:sd predescription: win rng sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:rng:sd prepostdescription: win rng sd prepost type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:aicIdescription: win tl aicI type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:aicI postdescription: win tl aicI post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:aicI predescription: win tl aicI pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:corrDdescription: win tl corrD type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:corrD postdescription: win tl corrD post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:corrD predescription: win tl corrD pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:d mdescription: win tl d m type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:d odescription: win tl d o type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:r escription: win tl r type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:rmsdescription: win tl rms type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:rms postdescription: win tl rms post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:rms predescription: win tl rms pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:rmsRdescription: win tl rmsR type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:rmsR postdescription: win tl rmsR post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:rmsR predescription: win tl rmsR pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:sd postdescription: win tl sd post type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:sd predescription: win tl sd pre type: ﬂoat data:ﬁ:ci:bnd:ti:ii:win:tl:sd prepostdescription: win tl sd prepost type: ﬂoat Chunks data:ﬁ:ci:chunk:ii:labdescription: label type: string data:ﬁ:ci:chunk:ii:tdescription: time start and end type: data:ﬁ:ci:chunk:ii:todescription: t non-rounded type: F0 data:ﬁ:ci:f0:bvdescription: ﬁle/channel related f0 base value type: ﬂoat data:ﬁ:ci:f0:rdescription: f0 residual after removal of the global f0 component type: list of ﬂoats data:ﬁ:ci:f0:tdescription: time stamps type: list of ﬂoats data:ﬁ:ci:f0:ydescription: f0 values after preprocessing type: list of ﬂoats, same length as t ile information data:ﬁ:ci:fsys:annot:dirdescription: directory of annotation ﬁle type: string data:ﬁ:ci:fsys:annot:extdescription: extension of annotation ﬁle type: string data:ﬁ:ci:fsys:annot:lab chunkdescription: general chunk label type: string data:ﬁ:ci:fsys:annot:lab paudescription: general pause label type: string data:ﬁ:ci:fsys:annot:lab syldescription: general syllable nucleus label type: string data:ﬁ:ci:fsys:annot:stmdescription: annotation ﬁle name stem type: string data:ﬁ:ci:fsys:annot:typdescription: annotation ﬁle type ( xml or TextGrid ) type: string data:ﬁ:ci:fsys:aud:dirdescription: directory of audio ﬁle type: string data:ﬁ:ci:fsys:aud:extdescription: extension of audio ﬁle type: string data:ﬁ:ci:fsys:aud:stmdescription: audio ﬁle name stem type: string data:ﬁ:ci:fsys:aud:typdescription: audio ﬁle type type: string data:ﬁ:ci:fsys:augment:channel:myTierNamedescription: channel number of each relevant tier myTierName in the annotation. Names of tiers derived by automaticchunking, phrasing, etc. will be added automatically. type: int data:ﬁ:ci:fsys:augment:chunk:tier out stmdescription: chunk tier output stem. In the augmented annotation ﬁle, the stem is concatenated with the respective channelnumber type: string data:ﬁ:ci:fsys:augment:glob:tierdescription: analysis tier name for prosodic boundaries, i.e. tier with prosodic boundary candidates. Max. 1 for eachchannel! type: string data:ﬁ:ci:fsys:augment:glob:tier out stmdescription: prosodic phrase tier output stem. In the augmented annotation ﬁle, the stem is concatenated with therespective channel number type: string data:ﬁ:ci:fsys:augment:glob:tier parent escription: name of the parent tier (e.g. chunks), whose boundaries limit the analysis and normalization window boundariesfor prosodic phrase extraction type: string data:ﬁ:ci:fsys:augment:lab chunkdescription: uniform chunk label type: string data:ﬁ:ci:fsys:augment:lab paudescription: uniform pause label type: string data:ﬁ:ci:fsys:augment:lab syldescription: uniform syllable nucleus label. Syllable boundaries are derived from this string by concatenating bnd type: string data:ﬁ:ci:fsys:augment:loc:tier accdescription: analysis event tier name for pitch accent detection, i.e. the tier containing the time stamps of pitch accentcandidates (e.g. syllable nuclei). Max 1 for each channel. type: list of strings data:ﬁ:ci:fsys:augment:loc:tier agdescription: analysis segment tier name for pitch accent detection, i.e. the tier containing segments within which maximallyone pitch accent can be realized (e.g. words) type: string data:ﬁ:ci:fsys:augment:loc:tier out stmdescription: pitch accent tier output stem. In the augmented annotation ﬁle, the stem is concatenated with the respectivechannel number type: string data:ﬁ:ci:fsys:augment:loc:tier parentdescription: name of the parent tier (e.g. prosodic phrases), relative to which the accent Gestalt is measured, and whoseboundaries limit the analysis and normalization window boundaries for pitch accent extraction type: string data:ﬁ:ci:fsys:augment:ncdescription: number of channels type: int data:ﬁ:ci:fsys:augment:stmdescription: annotation ﬁle name stem type: string data:ﬁ:ci:fsys:augment:syl:tier out stmdescription: syllable nucleus and boundary tier output stem. In the augmented annotation ﬁle, for the syllable boundarytier bnd is added to the stem, and for both nuclei and boundaries, the stem is concatenated with the respective channelnumber type: string data:ﬁ:ci:fsys:augment:syl:tier parentdescription: name of the parent tier (e.g. chunks), within which reference values are calculated, and whose boundarieslimit the analysis and normalization window boundaries for syllable nucleus detection) type: string data:ﬁ:ci:fsys:bnd:tierdescription: analysis tiers for boundary parameterization. Arbitrary number for each channel. type: list of strings data:ﬁ:ci:fsys:chunk:tierdescription: names of tiers that contain a chunk segmentation (only 1 tier for each channel). Names of automaticallygenerated tiers are expanded by channel index ci . type: list of strings data:ﬁ:ci:fsys:f0:dirdescription: f0 ﬁle directory ype: string data:ﬁ:ci:fsys:f0:extdescription: f0 ﬁle extension type: string data:ﬁ:ci:fsys:f0:stmdescription: f0 ﬁle name stem type: string data:ﬁ:ci:fsys:f0:typdescription: f0 ﬁle type type: string data:ﬁ:ci:fsys:glob:tierdescription: analysis tiers for global contour stylization. Max. 1 tier per channel. type: list of strings data:ﬁ:ci:fsys:gnl en:tierdescription: names of analysis tiers for standard energy feature extraction. Any number of tiers per channel supported. type: list of strings data:ﬁ:ci:fsys:gnl f0:tierdescription: names of analysis tiers for standard f0 feature extraction. Any number of tiers per channel supported. type: list of strings data:ﬁ:ci:fsys:loc:tier accdescription: time stamp analysis tiers for the 0-center of normalized time within a local contour segment. Max. 1 tier perchannel. type: list of strings data:ﬁ:ci:fsys:loc:tier agdescription: segment tiers for local contours. Max. 1 tier per channel. type: list of strings data:ﬁ:ci:fsys:rhy en:tierdescription: names of analysis tiers for energy rhythm feature extraction. Any number per channel. type: list of strings data:ﬁ:ci:fsys:rhy en:tier ratedescription: names of rate tiers for energy rhythm feature extraction. Any number per channel. type: list of strings data:ﬁ:ci:fsys:rhy f0:tierdescription: names of analysis tiers for f0 rhythm feature extraction. Any number per channel. type: list of strings data:ﬁ:ci:fsys:rhy f0:tier ratedescription: names of rate tiers for f0 rhythm feature extraction. Any number per channel. type: list of strings lobal segment features data:ﬁ:ci:glob:ii:classdescription: class ; global contour class index derived by clustering type: int data:ﬁ:ci:glob:ii:decl:bl:cdescription: bl c1, bl c0 type: data:ﬁ:ci:glob:ii:decl:bl:rdescription: bl r type: ﬂoat data:ﬁ:ci:glob:ii:decl:bl:ratedescription: bl rate type: ﬂoat data:ﬁ:ci:glob:ii:decl:bl:ydescription: stylized f0 baseline values type: list of ﬂoats data:ﬁ:ci:glob:ii:decl:errdescription: True if base and topline crossing type: boolean data:ﬁ:ci:glob:ii:decl:ml:cdescription: ml c1, ml c0 type: data:ﬁ:ci:glob:ii:decl:ml:rdescription: ml r type: ﬂoat data:ﬁ:ci:glob:ii:decl:ml:ratedescription: ml rate type: ﬂoat data:ﬁ:ci:glob:ii:decl:ml:ydescription: stylized f0 midline values type: list of ﬂoats data:ﬁ:ci:glob:ii:decl:rng:cdescription: rng c1, rng c0 type: data:ﬁ:ci:glob:ii:decl:rng:rdescription: rng r type: ﬂoat data:ﬁ:ci:glob:ii:decl:rng:ratedescription: rng rate type: ﬂoat data:ﬁ:ci:glob:ii:decl:rng:ydescription: stylized f0 range values type: list of ﬂoats data:ﬁ:ci:glob:ii:decl:tl:cdescription: tl c1, tl c0 type: data:ﬁ:ci:glob:ii:decl:tl:rdescription: tl r type: ﬂoat data:ﬁ:ci:glob:ii:decl:tl:rate escription: tl rate type: ﬂoat data:ﬁ:ci:glob:ii:decl:tl:ydescription: stylized f0 topline values type: list of ﬂoats data:ﬁ:ci:glob:ii:decl:tndescription: normalized time values (same length as all bl | ml | tl | rng:y ) type: list of ﬂoats data:ﬁ:ci:glob:ii:gnl:durdescription: dur type: ﬂoat data:ﬁ:ci:glob:ii:gnl:iqrdescription: iqr type: ﬂoat data:ﬁ:ci:glob:ii:gnl:mdescription: m type: ﬂoat data:ﬁ:ci:glob:ii:gnl:maxdescription: max type: ﬂoat data:ﬁ:ci:glob:ii:gnl:meddescription: med type: ﬂoat data:ﬁ:ci:glob:ii:gnl:mindescription: min type: ﬂoat data:ﬁ:ci:glob:ii:gnl:sddescription: sd type: ﬂoat data:ﬁ:ci:glob:ii:labdescription: lab type: string data:ﬁ:ci:glob:ii:ridescription: indices of local segments contained in global segment ii type: list of int data:ﬁ:ci:glob:ii:tdescription: global phrase time start and end type: data:ﬁ:ci:glob:ii:todescription: t non-rounded type: tandard energy features data:ﬁ:ci:gnl en:ti:ii:labdescription: lab type: string data:ﬁ:ci:gnl en:ti:ii:std:durdescription: dur type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:dur nrmdescription: normalized duration type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:iqrdescription: iqr type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:iqr nrmdescription: iqr nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:mdescription: m type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:m nrmdescription: m nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:maxdescription: max type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:max nrmdescription: max nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:meddescription: med type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:med nrmdescription: med nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:mindescription: min type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:min nrmdescription: min nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:rmsdescription: rms type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:rms nrmdescription: rms nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:sbdescription: sb type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:sd escription: sd type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:std:sd nrmdescription: sd nrm type: ﬂoat data:ﬁ:ci:gnl en:ti:ii:tdescription: analysis window start and endpoint type: data:ﬁ:ci:gnl en:ti:ii:tierdescription: tier name related to index ti type: string data:ﬁ:ci:gnl en:ti:ii:tndescription: normalization window start and endpoint type: data:ﬁ:ci:gnl en:ti:ii:todescription: t non-rounded type: list of ﬂoats data:ﬁ:ci:gnl en:ti:ii:ttdescription: trend window (not used) type: list of ﬂoats data:ﬁ:ci:gnl en ﬁle:durdescription: dur type: ﬂoat data:ﬁ:ci:gnl en ﬁle:iqrdescription: iqr type: ﬂoat data:ﬁ:ci:gnl en ﬁle:mdescription: m type: ﬂoat data:ﬁ:ci:gnl en ﬁle:maxdescription: max type: ﬂoat data:ﬁ:ci:gnl en ﬁle:meddescription: med type: ﬂoat data:ﬁ:ci:gnl en ﬁle:mindescription: min type: ﬂoat data:ﬁ:ci:gnl en ﬁle:sddescription: sd type: ﬂoat tandard f0 features data:ﬁ:ci:gnl f0:ti:ii:labdescription: lab type: string data:ﬁ:ci:gnl f0:ti:ii:std:durdescription: dur type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:dur nrmdescription: normalized duration type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:iqrdescription: iqr type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:iqr nrmdescription: iqr nrm type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:mdescription: m type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:m nrmdescription: m nrm type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:maxdescription: max type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:max nrmdescription: max nrm type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:meddescription: med type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:med nrmdescription: med nrm type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:mindescription: min type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:min nrmdescription: min nrm type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:sddescription: sd type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:std:sd nrmdescription: sd nrm type: ﬂoat data:ﬁ:ci:gnl f0:ti:ii:tdescription: analysis window start and endpoint type: data:ﬁ:ci:gnl f0:ti:ii:tier escription: tier name related to index ti type: string data:ﬁ:ci:gnl f0:ti:ii:tndescription: normalization window start and endpoint type: data:ﬁ:ci:gnl f0:ti:ii:todescription: t non-rounded type: list of ﬂoats data:ﬁ:ci:gnl f0:ti:ii:ttdescription: trend window, not used type: list of ﬂoats data:ﬁ:ci:gnl f0 ﬁle:durdescription: dur type: ﬂoat data:ﬁ:ci:gnl f0 ﬁle:iqrdescription: iqr type: ﬂoat data:ﬁ:ci:gnl f0 ﬁle:mdescription: m type: ﬂoat data:ﬁ:ci:gnl f0 ﬁle:maxdescription: max type: ﬂoat data:ﬁ:ci:gnl f0 ﬁle:meddescription: med type: ﬂoat data:ﬁ:ci:gnl f0 ﬁle:mindescription: min type: ﬂoat data:ﬁ:ci:gnl f0 ﬁle:sddescription: sd type: ﬂoat Grouping data:ﬁ:ci:grp:myGroupVar*description: myGroupvar refers to the grouping variable names speciﬁed in the conﬁguration sub-dictionary fsys:grp forﬁle name based grouping. The values are extracted from the ﬁle name and are always strings type: string

Local segment features data:ﬁ:ci:loc:ii:acc:cdescription: c ∗ ; polynomial coeﬃcients (highest order ﬁrst) type: list of ﬂoats data:ﬁ:ci:loc:ii:acc:tndescription: normalized time values type: list of ﬂoats data:ﬁ:ci:loc:ii:acc:ydescription: polynomial stylization values (same length as tn) type: list of ﬂoats data:ﬁ:ci:loc:ii:classdescription: class ; local contour class index derived by clustering ype: int data:ﬁ:ci:loc:ii:decl:bl:cdescription: bl c1, bl c0 type: data:ﬁ:ci:loc:ii:decl:bl:ratedescription: bl rate type: ﬂoat data:ﬁ:ci:loc:ii:decl:bl:ydescription: stylized f0 baseline values type: list of ﬂoats data:ﬁ:ci:loc:ii:decl:errdescription: True if base- and topline cross type: boolean data:ﬁ:ci:loc:ii:decl:ml:cdescription: ml c1, ml c0 type: data:ﬁ:ci:loc:ii:decl:ml:ratedescription: ml rate type: ﬂoat data:ﬁ:ci:loc:ii:decl:ml:ydescription: stylized f0 midline values type: list of ﬂoats data:ﬁ:ci:loc:ii:decl:rng:cdescription: rng c1, rng c0 type: data:ﬁ:ci:loc:ii:decl:rng:ratedescription: rng rate type: ﬂoat data:ﬁ:ci:loc:ii:decl:rng:ydescription: stylized f0 range values type: list of ﬂoats data:ﬁ:ci:loc:ii:decl:tl:cdescription: tl c1, tl c0 type: data:ﬁ:ci:loc:ii:decl:tl:ratedescription: tl rate type: ﬂoat data:ﬁ:ci:loc:ii:decl:tl:ydescription: stylized f0 topline values type: list of ﬂoats data:ﬁ:ci:loc:ii:decl:tndescription: normalized time values type: list of ﬂoats data:ﬁ:ci:loc:ii:gnl:durdescription: dur type: ﬂoat data:ﬁ:ci:loc:ii:gnl:dur nrmdescription: dur nrm type: ﬂoat ata:ﬁ:ci:loc:ii:gnl:iqrdescription: iqr type: ﬂoat data:ﬁ:ci:loc:ii:gnl:iqr nrmdescription: iqr nrm type: ﬂoat data:ﬁ:ci:loc:ii:gnl:mdescription: m type: ﬂoat data:ﬁ:ci:loc:ii:gnl:m nrmdescription: m nrm type: ﬂoat data:ﬁ:ci:loc:ii:gnl:maxdescription: max type: ﬂoat data:ﬁ:ci:loc:ii:gnl:max nrmdescription: max nrm type: ﬂoat data:ﬁ:ci:loc:ii:gnl:meddescription: med type: ﬂoat data:ﬁ:ci:loc:ii:gnl:med nrmdescription: med nrm type: ﬂoat data:ﬁ:ci:loc:ii:gnl:mindescription: min type: ﬂoat data:ﬁ:ci:loc:ii:gnl:min nrmdescription: min nrm type: ﬂoat data:ﬁ:ci:loc:ii:gnl:sddescription: sd type: ﬂoat data:ﬁ:ci:loc:ii:gnl:sd nrmdescription: sd nrm type: ﬂoat data:ﬁ:ci:loc:ii:gst:bl:d ﬁndescription: bl d ﬁn type: ﬂoat data:ﬁ:ci:loc:ii:gst:bl:d initdescription: bl d init type: ﬂoat data:ﬁ:ci:loc:ii:gst:bl:rmsdescription: bl rms type: ﬂoat data:ﬁ:ci:loc:ii:gst:bl:sddescription: bl sd type: ﬂoat data:ﬁ:ci:loc:ii:gst:ml:d ﬁn escription: ml d ﬁn type: ﬂoat data:ﬁ:ci:loc:ii:gst:ml:d initdescription: ml d init type: ﬂoat data:ﬁ:ci:loc:ii:gst:ml:rmsdescription: ml rms type: ﬂoat data:ﬁ:ci:loc:ii:gst:ml:sddescription: ml sd type: ﬂoat data:ﬁ:ci:loc:ii:gst:residual:bl:cdescription: c ∗ ; local contour coefs in descending order. Polynomial ﬁtted on residual after baseline subtraction type: list of ﬂoats data:ﬁ:ci:loc:ii:gst:residual:ml:cdescription: c ∗ ; local contour coefs in descending order. Polynomial ﬁtted on residual after midline subtraction type: list of ﬂoats data:ﬁ:ci:loc:ii:gst:residual:rng:cdescription: c ∗ ; local contour coefs in descending order. Polynomial ﬁtted on residual after range normalization type: list of ﬂoats data:ﬁ:ci:loc:ii:gst:residual:tl:cdescription: c ∗ ; local contour coefs in descending order. Polynomial ﬁtted on residual after topline subtraction type: list of ﬂoats data:ﬁ:ci:loc:ii:gst:rng:d ﬁndescription: rng d ﬁn type: ﬂoat data:ﬁ:ci:loc:ii:gst:rng:d initdescription: rng d init type: ﬂoat data:ﬁ:ci:loc:ii:gst:rng:rmsdescription: rng rms type: ﬂoat data:ﬁ:ci:loc:ii:gst:rng:sddescription: rng sd type: ﬂoat data:ﬁ:ci:loc:ii:gst:tl:d ﬁndescription: tl d ﬁn type: ﬂoat data:ﬁ:ci:loc:ii:gst:tl:d initdescription: tl d init type: ﬂoat data:ﬁ:ci:loc:ii:gst:tl:rmsdescription: tl rms type: ﬂoat data:ﬁ:ci:loc:ii:gst:tl:sddescription: tl sd type: ﬂoat data:ﬁ:ci:loc:ii:is ﬁndescription: is ﬁn (yes, no) type: string ata:ﬁ:ci:loc:ii:is ﬁn chunkdescription: is ﬁn chunk (yes, no) type: string data:ﬁ:ci:loc:ii:is initdescription: is init (yes, no) type: string data:ﬁ:ci:loc:ii:is init chunkdescription: is init chunk (yes, no) type: string data:ﬁ:ci:loc:ii:lab accdescription: lab pnt ; label from local event tier type: string data:ﬁ:ci:loc:ii:lab agdescription: lab ; label from local segment tier type: string data:ﬁ:ci:loc:ii:ridescription: index of global parent segment type: int data:ﬁ:ci:loc:ii:tdescription: analysis window starting point, endpoint, and center. For only event tier input, these values are given by startand end of a symmetric window around each time stamp, and the time stamp itself. For only segment tier input, start- andendpoint are given by the segment’s on and oﬀset, and the center corresponds to the segment’s midpoint. For both event andsegment tier input, start- and endpoint are given by the segment’s on and oﬀset, and the center by the event’s time stamp. type: data:ﬁ:ci:loc:ii:tndescription: Normalization window start and endpoint to normalize f0 standard features. type: data:ﬁ:ci:loc:ii:todescription: original input time values (1 for events, 2 for segments, 3 for events+segments) type:

1, 2 or 3 element list of ﬂoats

Event rates data:ﬁ:ci:rate:myTierName*description: for the events or segments of all tier names speciﬁed in the conﬁguration by rhy f0:tier rate their overall rateis measured within the ﬁle. type: ﬂoat

Energy rhythm features data:ﬁ:ci:rhy en:ti:ii:labdescription: lab type: string data:ﬁ:ci:rhy en:ti:ii:rate:myTierName*description: myRateTier rate ; rate of items in myTierName within the current interval ii of tier with index ti type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:cdescription: DCT coeﬃcients in user deﬁned frequency band type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:c origdescription: all DCT coeﬃcients type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:cbin escription: summed DCT coefs within frequency bins type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:durdescription: dur type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:fbindescription: lower boundaries of frequency bins (same length as cbin ) type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:mdescription: weighted coeﬃcient mean type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:maedescription: mean absolute error between IDCT and original contour type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:sddescription: weighted coeﬃcient standard deviation type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of ﬂoats; length depends on conﬁg branch styl:rhy en:rhy:nsm data:ﬁ:ci:rhy en:ti:ii:rhy:wgt:myTierName*:maedescription: myAnalysisTier myRateTier mae ; mean absolute error between original contour and IDCT of coeﬃcientsaround the rate of the items in tier myTierName type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:wgt:myTierName*:propdescription: myAnalysisTier myAnalysisTier prop ; proportion of the coeﬃcient weights around the rate of the items in tier myTierName relative to coeﬃcients’ overall sum type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:rhy:wgt:myTierName*:ratedescription: myAnalysisTier myAnalysisTier rate ; rate of the items in tier myTierName type: ﬂoat data:ﬁ:ci:rhy en:ti:ii:ri:myTierName*description: item indices in tier myTierName that fall within the segment ii of tier with index ti type: list of int data:ﬁ:ci:rhy en:ti:ii:tdescription: segment ii start and endpoint type: data:ﬁ:ci:rhy en:ti:ii:tierdescription: tier name related to index ti type: string data:ﬁ:ci:rhy en:ti:ii:tndescription: as t, irrelevant type: data:ﬁ:ci:rhy en:ti:ii:to escription: t non-rounded type: data:ﬁ:ci:rhy en:ti:ii:ttdescription: segment ii start, mid, and endpoint; irrelevant type: data:ﬁ:ci:rhy en ﬁle:cdescription: DCT coeﬃcients in user deﬁned frequency band type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:c origdescription: all DCT coeﬃcients type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:cbindescription: summed DCT coefs within frequency bins type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:durdescription: dur type: ﬂoat data:ﬁ:ci:rhy en ﬁle:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:fbindescription: lower boundaries of frequency bins type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:mdescription: weighted coeﬃcient mean type: string data:ﬁ:ci:rhy en ﬁle:maedescription: mean absolute error between IDCT and original contour type: ﬂoat data:ﬁ:ci:rhy en ﬁle:sddescription: weighted coeﬃcient standard deviation type: ﬂoat data:ﬁ:ci:rhy en ﬁle:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of ﬂoats data:ﬁ:ci:rhy en ﬁle:wgt:myTierName*:maedescription: myAnalysisTier mae ; mean absolute error between original contour and IDCT of coeﬃcients around the rateof the items in tier myTierName type: ﬂoat data:ﬁ:ci:rhy en ﬁle:wgt:myTierName*:propdescription: myAnalysisTier prop ; proportion of the coeﬃcient weights around the rate of the items in tier myTierName relative to coeﬃcients’ overall sum type: ﬂoat data:ﬁ:ci:rhy en ﬁle:wgt:myTierName*:ratedescription: myAnalysisTier rate ; rate of the items in tier myTierName type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:labdescription: lab type: string data:ﬁ:ci:rhy f0:ti:ii:rate:myTierName*description: myAnalysisTier rate ; rate of items in myTierName within the current interval ii of tier with index ti type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:rhy:cdescription: DCT coeﬃcients in user deﬁned frequency band type: list of ﬂoats data:ﬁ:ci:rhy en:ti:ii:rhy:c origdescription: all DCT coeﬃcients type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:rhy:cbindescription: summed DCT coefs within frequency bins type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:rhy:durdescription: dur type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:rhy:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:rhy:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:rhy:fbindescription: lower boundaries of frequency bins type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:rhy:mdescription: weighted coeﬃcient mean type: string data:ﬁ:ci:rhy f0:ti:ii:rhy:maedescription: mean absolute error between IDCT and original contour type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:rhy:sddescription: weighted coeﬃcient standard deviation type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:rhy:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:rhy:wgt:myTierName*:maedescription: myAnalysisTier myAnalysisTier mae ; mean absolute error between original contour and IDCT of coeﬃcientsaround the rate of the items in tier myTierName type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:rhy:wgt:myTierName*:propdescription: myAnalysisTier myAnalysisTier prop ; proportion of the coeﬃcient weights around the rate of the items in tier myTierName relative to coeﬃcients’ overall sum type: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:rhy:wgt:myTierName*:ratedescription: myAnalysisTier myAnalysisTier rate ; rate of the items in tier myTierName ype: ﬂoat data:ﬁ:ci:rhy f0:ti:ii:ri:myTierNamedescription: item indices in tier myTierName that fall within the segment ii of tier with index ti type: list of int data:ﬁ:ci:rhy f0:ti:ii:tdescription: segment ii start and endpoint type: data:ﬁ:ci:rhy f0:ti:ii:tierdescription: tier name related to index ti type: string data:ﬁ:ci:rhy f0:ti:ii:tndescription: as t, irrelevant type: data:ﬁ:ci:rhy f0:ti:ii:todescription: t non-rounded type: list of ﬂoats data:ﬁ:ci:rhy f0:ti:ii:ttdescription: segment ii start, mid, and endpoint; irrelevant type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:cdescription: DCT coeﬃcients in user deﬁned frequency band type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:c origdescription: all DCT coeﬃcients type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:cbindescription: summed DCT coefs within frequency bins type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:durdescription: dur type: ﬂoat data:ﬁ:ci:rhy f0 ﬁle:fdescription: frequencies of DCT coefs in c (same length as c ) type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:f origdescription: frequencies of DCT coef in c orig (same length as c orig ) type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:fbindescription: lower boundaries of frequency bins type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:mdescription: weighted coeﬃcient mean type: string data:ﬁ:ci:rhy f0 ﬁle:maedescription: mean absolute error between IDCT and original contour type: ﬂoat data:ﬁ:ci:rhy f0 ﬁle:sddescription: weighted coeﬃcient standard deviation type: ﬂoat ata:ﬁ:ci:rhy f0 ﬁle:smdescription: sm ∗ ; spectral moments of DCT coefs type: list of ﬂoats data:ﬁ:ci:rhy f0 ﬁle:wgt:myTierName*:maedescription: myAnalysisTier mae ; mean absolute error between original contour and IDCT of coeﬃcients around the rateof the items in tier myTierName type: ﬂoat data:ﬁ:ci:rhy f0 ﬁle:wgt:myTierName*:propdescription: myAnalysisTier prop ; proportion of the coeﬃcient weights around the rate of the items in tier myTierName relative to coeﬃcients’ over all sum type: ﬂoat data:ﬁ:ci:rhy f0 ﬁle:wgt:myTierName*:ratedescription: myAnalysisTier rate ; rate of the items in tier myTierName type: ﬂoat This sub-dictionary is accessed by copa[’clst’] and contains the outcome of the clustering of global and localcontours (cf. sections 8.5.3 and 8.7.3).

Global contour classes clst:glob:cdescription: slope coef matrix to be clustered type:

2d array of ﬂoats clst:glob:cntrdescription: centroid matrix, one row per contour class type:

2d array of ﬂoats clst:glob:ijdescription: location of each feature vector in copa:data:fi:ci:glob:ii : each row contains 3 indices for the ﬁle ﬁ , thechannel ci , and the global segment ii , respectively) type:

2d array of int clst:glob:objdescription: clustering object which was used for global contour clustering type: object clst:glob:valdescription: mean silhouette type: ﬂoat

Local contour classes clst:loc:cdescription: polynomial coef matrix to be clustered type: list of ﬂoats clst:loc:cntrdescription: centroid matrix, one row per contour class type: list of strings clst:loc:ijdescription: location of each feature vector in copa:data:fi:ci:loc:ii : each row contains 3 indices for the ﬁle, thechannel, and the local segment, respectively type: list of ﬂoats clst:loc:objdescription: clustering object which was used for local contour clustering type: string clst:loc:valdescription: mean silhouette type: ﬂoat This sub-dictionary is accessed by copa[’val’] and contains validation measures for stylization and clustering.

Clustering val:clst:glob:sil meandescription: mean silhouette of all data points for global contour clustering type: ﬂoat val:clst:loc:sil meandescription: mean silhouette of all data points for local contour clustering type: ﬂoat

Stylization val:styl:glob:err propdescription: proportion of base/topline crossings over all global segments type: ﬂoat val:styl:loc:rms meandescription: mean RMSD between original and stylized local contour type: ﬂoat

Three types of f0 tables can be exported: • preprocessed f0 • residual f0 (after removal of the global register component) • resynthesized f0 (superposition of global and local stylized component)As the f0 table input format in each output table the ﬁrst column gives the time stamps, and the second till lastcolumns contain the f0 values (in Hz) for the recording channels. The tables will be stored below fsys:export:dir insub-directories named after the type of f0 output ( f0 preproc, f0 residual, f0 resyn ). For each input f0 ﬁle an outputﬁle with the same name is generated. The log ﬁle in fsys:export:dir + fsys:export:stm + log.txt contains warnings, information about too short segments tobe skipped, and some validations below the line ’

13 Plotting

To activate plotting, set navigate:do plot=1

Browsing

Browsing through stylizations can be carried out online (in order to check for appropriate stylizationparameter settings) or after feature extraction, which is controlled by plot:browse:time

To select the stylization to be plotted the corresponding branches in plot:browse:typ:*:* need to be set to 1. E.g. plot:browse:typ:complex:superpos=1 produces plots as in Figure 5.88 rouping

One can also plot stylizations based on parameter centroids for a speciﬁed grouping. By plot:grp:typ:*:*=1 the user selects the stylization to be plotted. The grouping is deﬁned by plot:grp:grouping

The entries in this list can be lab for item labels or the grouping factor names speciﬁed in fsys:grp:lab . Centroidswill be plotted for each factor level combination.Browsing and grouping plots can be saved as .png ﬁles by plot:browse:save=1plot:grp:save=1

The browse mode output ﬁle names are the concatenation of fsys:pic:dir + fsys:pic:stm + ﬁnal | online + typ + set + ﬁleIndex + channelIndex + tierName + itemIndex . typ and set refer to the *-keys in plot:browse:typ:*:* setto 1.The grouping mode output ﬁle names are concatenated from fsys:pic:dir + fsys:pic:stm + factorLevelCom-bination . One ﬁle is generated for each factor level combination.

14 Known bugs

Not yet all missing or wrong conﬁgurations will be reported in the log ﬁle but might result in some Python errormessage. Same with unexpected annotations. If you cannot locate the error, you can send the conﬁguration ﬁle, thelog ﬁle, and the error message to: [email protected] error message that includes the line return json.load(h) results from a wrongly formatted JSON conﬁgurationﬁle. The last line of the error message points you to the erroneous line in the JSON ﬁle.Currently (September 24th, 2018), depending on your scipy and numpy versions, scipy.signal may cause a

Future-Warning: Using a non-tuple sequence for multidimensional indexing is deprecated . This warning can be ignored.

15 History

In the following only those updates directly relevant for the user are documented, that is, conﬁguration and/or featureset updates. For a documentation of all remaining updates please see the history.txt ﬁle which is part of the codedistribution.

Version 0.2.1, December 30th, 2016Batch clustering for phrase boundary and pitch accent extraction.

Additionally to clustering of bound-ary/pitch accent candidates on the ﬁle level, clustering is now also supported on the entire dataset level. This isexpected to improve the prosodic structure extraction in short ﬁles (e.g. backchannel turns in dialog data), that donot contain enough material for clustering. Dataset vs. ﬁle-level clustering is selected by the conﬁguration branches: augment:glob:unitaugment:loc:unit

See section 7.3 and 7.4 for further details.

Phone duration feature for phrase boundary and pitch accent extraction.

In case a phone segment tier isavailable, z-scored vowel length can be used as a feature for phrase boundary and pitch accent detection. The length ofthe vowel associated with the prosodic event candidate is divided by its mean length derived from the entire dataset.For boundary candidates the associated vowel is the last vowel segment with an onset before the boundary candidatetime stamp. For accent candidates the associated vowel interval includes the candidate time stamp. This feature willbe beneﬁcial for languages in which phrase boundaries and/or accents are marked by phone segment lengthening. Seesection 7.3 and 7.4 for details. The related new conﬁguration branches are: augment:glob:wgt:pho : add vowel length to boundary feature set augment:loc:wgt:pho : add vowel length to accent feature set fsys:pho:tier : name(s) of phone segment tier(s) fsys:pho:vow : vowel pattern

Version 0.3.1, January 31st, 2017 ynchronize locations where to extract loc, loc ext, gnl f0 | en , and rhy f0 | en feature sets Due to thestrict hierarchy principle and to window length constraints it is not always possible to extract loc features at anylocation where gnl and rhy features can be obtained. If the user is interested only in locations where all these featuresets are available, so that the corresponding feature matrices can be concatenated, then the option preproc:loc sync is to be set to 1.

Transforming events to segments separately for each feature set

Next to the global window length settingin opt:preproc:point winopt:preproc:nrm win window lenghts can be set individually for each of the feature sets loc( ext), gnl f0, gnl en, rhy f0, rhy en byspecifying: preproc:myFeatureSet:point winpreproc:myFeatureSet:nrm win

New design of rhy f0 and rhy en output tables

All myAnalysisTier myRateTier myParameter column namesare renamed to myRateTier myParameter . The analysis tier name can be read from the tier column. By this theanalysis tier can be used as a grouping factor. Analysis/rate tier combinations across recording channels are notconsidered. Thus, cells in myRateTier myParameter columns are set to NA if myRateTier and the analysis tier of therespective row are not derived from the same channel. New fullpath switch for R code ﬁles

If set to 0, only the ﬁle stem is outputted. For 1, the full path is written. fsys:export:fullpath

Version 0.4.1, May 23rd, 2017New global segment register features • { bl,ml,tl,rng } rate : base-/mid-/topline/range rate (ST or Hz per sec) New local segment register features (extended feature set) • { bl,ml,tl,rng } { c0,c1,rate } : base-/mid-/topline/range intercept, slope, and rate (ST or Hz per sec) New option for rhythm analyses of energy contours • styl:rhy en:sig:scale – if set to 1, the signal is scaled to its maximum amplitude. This is suggested especiallyif signals of diﬀerent recording conditions are to be compared. Version 0.4.2, July 18th, 2017New options for selected segment plotting and index printing • plot:browse:single plot:active • plot:browse:single plot:file i • plot:browse:single plot:channel i • plot:browse:single plot:segment i • plot:browse:verbose • plot:color Version 0.4.3, September 5th, 2017Time stamps added to boundary feature output

Columns t oﬀ and t on added to boundary features output. t oﬀ gives the end time of the pre-boundary segment, t on the start time of the post-boundary segment.

Version 0.5.1, October 12th, 2017 ummary table output Output of a csv summary ﬁle, that contains mean and variance values for each featureper ﬁle and channel. fsys:export:summary

Version 0.6.1, November 20th, 2017F0 reference value by any grouping variable

So far the f0 reference value for semitone conversion was calculatedseparately for each ﬁle and each channel. Now it can be calculated for each level of a grouping variable (most relevant:speaker ID) which can be read from the ﬁle name as to be speciﬁed in fsys:grp . preproc:base prct grp Version 0.6.2, November 23rd, 2017New features for sets glob and loc

Mean values for base-, mid-, topline, and range ( bl m, ml m, tl m, rng m ). Version 0.6.3, December 11th, 2017New features for sets rhy en and rhy f0

Number of peaks in absolute DCT spectrum ( n peak ), correspondingfrequency for coeﬃcient with amplitude maximum ( f max ), and frequency diﬀerence for each selected prosodic eventrate to f max ( dgm ) and to the nearest peak ( dlm ). Version 0.7.1, January 10th, 2018New features for sets gnl en and gnl f0

F0 and energy quotients: mean(initPart)/ mean(nonInit) ( qi ), mean(ﬁnalPart)/mean(nonFinal) ( qf ), mean(initialPart)/ mean(ﬁnalPart) ( qb ), mean(maxPart)/ mean(nonMax) (qm). 2nd order poly-nomial ﬁt through contour ( c0, c1, c2 ). Option: styl:gnl:win to determine length of initial and ﬁnal part (in seconds). Version 0.7.3, January 18th, 2018New outlier deﬁnition and default

Now also Tukey’s fences outlier deﬁnition is supported. The default referencenow is set to mean instead of median . Version 0.7.4, January 19th, 2018Event tier support for global segments

In event tiers the time points are treated as right boundaries of globalsegments. See section 8.5.1.

Version 0.7.5, January 23rd, 2018Chunking update • default for augment:chunk:e rel changed to 0.1 • fallback RMS over entire channel content now calculated on absolute amplitude values greater than the median.This prevents too many extracted chunks in signals that consist mainly of speech pauses. Version 0.7.6, January 25th, 2018Chunking update

Silence margins can be set at chunk starts and ends. augment:chunk:margin

Version 0.7.7, January 30th, 2018Output tables column separator now can freely be chosen. fsys:export:sep

Version 0.7.9, April 5th, 2018 pectral balance calculation update • now can be carried out in spectral and time domain • time window in the center of the analysed segment and frequency window can be speciﬁed • for time domain analysis α can be speciﬁed as factor or as lower boundary frequency for pre-emphasis. • styl:gnl en:alpha is deprecated and replaced by styl:gnl en:sb:alphastyl:gnl en:sb Version 0.7.12, June 26th, 2018New position features of local segments in global ones • loc feature table now contains two more columns is init and is ﬁn both describing the position of the localsegment in the global one. Version 0.8.1, July 3rd, 2018“voice” feature set for voice quality added • for shimmer and jitter features: mean values and 3rd order polynomial stylization to capture changes of thesevariables over time. New features jit, jit c [0 − , shim, shim c [0 − • to extract these features pulse ﬁles need to be extracted beforehand, e.g. by using the added Praat scripts extract pulse.praat and extract pulse stereo.praat navigate:do styl voicestyl:voicefsys:pulsefsys:voice:tier Version 0.8.5, August 2nd, 2018Boundary features now can also be calculated on f0 residual contour • useful e.g. to normalize subsequent accent groups by their respective register in an IP • Beware: not meaningful across IP boundaries. To be ﬁltered by column is ﬁn (see next paragraph) styl:registerstyl:bnd:residual

All feature sets: new columns marking initial and or ﬁnal position of items within global segments andchunks • extension of update to version 0.7.12. • all feature tables now contain the columns is init , is ﬁn , is init chunk , and is ﬁn chunk . The former two describethe position of the current item (e.g. local segment, segment boundary etc.) in the global segment one. Thelatter two locate the current item within the underlying chunk. If no chunk tier is speciﬁed, is ﬁn chunk and is init chunk give the item’s position in the entire channel. If no global segment’s tier is speciﬁed, is init and is ﬁn both are set to no . Version 0.8.8, September 7th, 2018 • Scipy version ≥ Version 0.8.9, September 10th, 2018New boundary discontinuity features • for ﬁtted line slope diﬀerences separately for each boundary window deﬁnition std, trend, win and registerrepresentation bl, ml, rng, tl : • post : f0 slope of post-boundary segment subtracted from slope of joint segment • pre : f0 slope of pre-boundary segment subtracted from slope of joint segment • prepost : f0 slope of post-boundary segment substracted from slope of pre-boundary segment • Features: myWindow myRegister sd post | sd pre | sd prepost Version 0.8.10, September 14th, 2018 ew boundary discontinuity features • for ﬁtting error (RMSE) ratios and the Akaike information criterion increase of joint vs. single ﬁts. • separately for each boundary window deﬁnition std, trend, win and register representation bl, ml, rng, tl : • single pre- and post-boundary vs. joint segment stylization • pre-boundary vs. ﬁrst half of joint window ( pre ) • post-boundary vs. second half of joint segment ( post ) • Features: myWindow myRegister rmsR | rmsR post | rmsR pre , myWindow myRegister aicI | aicI post | aicI pre Version 0.8.11, September 24th, 2018More robust treatment of multiple accent per accent group cases • relevant if both fsys:loc:tier ag and fsys:loc:tier acc are speciﬁed • new option preproc:loc align with values skip (skipping AGs with more than 1 ACC), left (keeping ﬁrst ACC),and right (keeping last ACC) Commented conﬁguration ﬁle added • In the doc/ directory a new ﬁle copasul commented conﬁg.json.txt was added in which all options are commentedfor a quick overview.

Version 0.8.12, October 19th, 2018 bnd feature set extended • for each register representation 2 more features were added that could be of use e.g. for measuring downstep. • d o: onset diﬀerence • d m: diﬀerence of means 93 eferences [1] Belz, M. and

U. Reichel : Pitch characteristics of ﬁlled pauses in spontaneous speech . In

DiSS 2015 , Edinburgh,Scotland, 2015. .[2]

Beˇnuˇs, ˇS. , U. Reichel and

K. M´ady : Modelling accentual phrase intonation in Slovak and Hungarian . In

Complex Visibles Out There , vol. 4, pp. 677–689. Palack´y University, Olomouc, Czech Republic, 2014. .[3]

Beˇnuˇs, ˇS. , U. Reichel and

J. ˇSimko : F0 discontinuity as a marker of prosodic boundary strength in Lombardspeech . In

Proc. Interspeech , p. paper 953, Dresden, Germany, 2015. .[4]

Boersma, P. and

D. Weenink : PRAAT, a system for doing phonetics by computer . Techn. Rep., Institute ofPhonetic Sciences of the University of Amsterdam, 1999. 132–182.[5]

Fant, G. , A. Kruckenberg , J. Liljecrants and

S. Hertegard : Acoustic-phonetic studies of prominence inSwedish . TMH-QPSR, 41(2–3):1–52, 2000.[6]

Fuchs, S. and

U. Reichel : On the relation between pointing gestures and speech production in German countingout rhymes: Evidence from motion capture data and speech acoustics . In

Proc. P&P , Munich, Germany, 2016. .[7]

Heinrich, C. and

F. Schiel : The inﬂuence of alcoholic intoxication on the short-time energy function of speech .J. Acoust. Soc. Am., 135(5):2942–2951, 2014.[8]

Kalkhoff, A. : Corpus data and tools for the analysis of spoken Haitian Creole prosody . Poster, 2015. https://methodenromanistentag2015.files.wordpress.com/2015/10/kalkhoff.pdf .[9]

Kisler, T. , U. Reichel , F. Schiel , C. Draxler , B. Jackl and

N. P¨orner : BAS Speech Science WebServices - an update of current developments . In

Proc. LREC , pp. 3880–3885, Portoroˇz, Slovenia, 2016. .[10]

M´ady, K. and

U. Reichel : How to distinguish between self– and other-directed wh-questions? . In

Proc. P&P , Munich, Germany, 2016. .[11]

M´ady, K. , U. Reichel , A. Szalontai , A. Koh´ari and

A. Deme : Prosodic characteristics of infant-directedspeech as a function of multiple pregnancy . In

Proc. Speech Prosody , pp. 294–298, Poznan, Poland, 2018. .[12]

Mittelhammer, K. and

U. Reichel : Characterization and prediction of dialogue acts using prosodic features .In

Jokisch, O. (ed.):

Elektronische Sprachverarbeitung 2016 , vol. 81 of

Studientexte zur Sprachkommunika-tion , pp. 160–167. TUDpress, Dresden, Germany, 2016. .[13]

Pfitzinger, H. , S. Burger and

S. Heid : Syllable Detection in Read and Spontaneous Speech . In

Proc. ICSLP ,vol. 2, pp. 1261–1264, Philadelphia, 1996.[14]

Reichel, U. : Linking bottom-up intonation stylization to discourse structure . Computer, Speech,and Language, 28:1340–1365, 2014. .[15]

Reichel, U. : Personality prediction based on intonation stylization . In

Proc. ICPhS , p. paper 616, Glasgow, Scot-land, 2015. .[16]

Reichel, U. : Unsupervised extraction of prosodic structure . In

Elektronische Sprachsignalverarbeitung 2017 ,vol. 86 of

Studientexte zur Sprachkommunikation , pp. 262–269. TUDPress, Dresden, Germany, 2017. .[17]

Reichel, U. , ˇS. Beˇnuˇs and K. M´ady : Entrainment proﬁles: Comparison by gender, role, and feature set .Speech Communication, 100:46–57, 2018. https://doi.org/10.1016/j.specom.2018.04.009 .[18]

Reichel, U. and

J. Cole : Entrainment analysis of categorical intonation representations . In

Proc. P&P , Mu-nich, Germany, 2016. .[19]

Reichel, U. and

P. Lendvai : Veracity computing from lexical cues and perceived certainty trends . In

Proc.2nd Workshop on Noisy User-generated Text , Osaka, Japan, 2016. . 9420]

Reichel, U. and

K. M´ady : Parameterization of F0 register and discontinuity to predict prosodic boundarystrength in Hungarian spontaneous speech . In

Wagner, P. (ed.):

Elektronische Sprachsignalverarbeitung 2013 ,vol. 65 of

Studientexte zur Sprachkommunikation , pp. 223–230. TUDpress, Dresden, Germany, 2013. .[21]

Reichel, U. and

K. M´ady : Comparing parameterizations of pitch register and its discontinuities at prosodicboundaries for Hungarian . In

Proc. Interspeech 2014 , pp. 111–115, Singapore, 2014. .[22]

Reichel, U. , K. M´ady and ˇS. Beˇnuˇs : Parameterization of prosodic headedness . In

Proc. Interspeech , p.paper 929, Dresden, Germany, 2015. .[23]

Reichel, U. , K. M´ady and ˇS. Beˇnuˇs : Acoustic proﬁles for prosodic headedness and constituency . In

Proc.Speech Prosody , pp. 699–703, Poznan, Poland, 2018. .[24]

Reichel, U. , K. M´ady and

F. Kleber : How prominence and prosodic phrasing interact . In

Jokisch, O. (ed.):

Elektronische Sprachverarbeitung 2016 , vol. 81 of

Studientexte zur Sprachkommunikation , pp. 153–159.TUDpress, Dresden, Germany, 2016. .[25]

Schiel, F. : Automatic Phonetic Transcription of Non-Prompted Speech . In