11
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 1 Pragmatic Aspects of Discourse Production for the Automatic Identification of Alzheimer’s Disease Anna Pompili, Alberto Abad, David Martins de Matos, Senior Member, IEEE, and Isabel Pav˜ ao Martins Abstract—Clinical literature provides convincing evidence that language deficits in Alzheimer’s disease (AD) allow for distin- guishing patients with dementia from healthy subjects. Currently, computational approaches have widely investigated lexicoseman- tic aspects of discourse production, while pragmatic aspects like cohesion and coherence, are still mostly unexplored. In this work, we aim at providing a more comprehensive characterization of language abilities for the automatic identification of AD in narrative description tasks by also incorporating pragmatic aspects of speech production. To this end, we investigate the relevance of a recently proposed set of pragmatic features extracted from an automatically generated topic hierarchy graph in combination with a complementary set of state of the art features encoding lexical, syntactic and semantic cues. Experi- mental results on the DementiaBank corpus show an accuracy improvement from 82.6% to 85.5% in identifying AD patients when pragmatic features are incorporated to the set of lexicose- mantic features. Nevertheless, these results are obtained relying on manual transcriptions, which strongly limits the applicability of computational analysis to clinical settings. Thus, in this work we additionally carry out an analysis of the errors introduced by a speech recognition system and the way in which they impact the performance of the proposed method. In spite of the high word error rates obtained on these data (~40%), automatic AD identification accuracy decreased only to 79.7%, which is considered a remarkable result when compared with solutions based on manual transcriptions. Index Terms—Automatic speech analysis, clinical diagnosis, mental disorders, natural language processing I. I NTRODUCTION T HE prevalence of Alzheimer’s disease (AD) in 2015 was 46.8 million people worldwide. Due to the increase of average lifespan, it is expected that these data will triplicate by 2050, affecting 131.5 million people [1]. So far there is no treatment to stop or reverse the progression of the disease, though some may temporarily improve the clinical condition. AD is currently diagnosed through a review of patient clinical history and disability, neuropsychological tests assessing cognitive decline in different domains (memory, reasoning, language, and visuospatial abilities), brain imaging and cerebrospinal fluid exams. While the prominent symp- tom of the disease is memory impairment, existing literature Manuscript submitted for review on May 15, 2019. This work was partially supported by Portuguese national funds through Fundac ¸˜ ao para a Ciˆ encia e a Tecnologia (FCT) under grants SFRH/BD/97187/2013 and UIDB/50021/2020. (Corresponding author: Anna Pompili.) A. Pompili, A. Abad, and D. Martins de Matos are with the INESC- ID, Instituto Superior ecnico, 1049-001 Lisboa, Portugal. (e-mail: [email protected]; [email protected]; david.matos@inesc- id.pt). I. Pav˜ ao Martins is with Laborat´ orio de Estudos de Linguagem, Faculdade de Medicina and Instituto de Medicina Molecular, 1649-004 Lisboa, Portugal. (e-mail: [email protected]). provides evidence that discourse deficits are an important factor and may distinguish AD from healthy individuals [2], [3]. Impairments in language abilities are usually the result of a decline either in the semantic or pragmatic levels of language processing. Semantic processing is related to the content of language and involves words and their meaning. Pragmatic processing is concerned with the inappropriate use of language in social situations. It is likely that the semantic and pragmatic levels are interdependent, and that semantic deficits in word finding may contribute to pragmatic deficits that lead, for example, to the problem of maintaining the topic of a conversation [4]. The aspects described above are evaluated in the macrolin- guistic dimension of discourse production by rating of the cohesion and coherence. While cohesion expresses the se- mantic relationship between elements, coherence is related to the conceptual organization of speech, and is usually analyzed through the study of local, global, and topic coherence. Local coherence refers to the conceptual links that maintain meaning between proximate propositions. Global coherence refers to the way in which the discourse is organized with respect to an overall plan. Finally, topic coherence refers to the organization and maintenance of the topics used within the discourse. In the literature there has been a growing interest in inves- tigating the computational analysis of language impairment in AD. Overall, existing works assess the quality of discourse production through the automatic analysis of a combination of lexical, syntactic, acoustic, and semantic features [5]–[11]. Very recently, some studies approached linguistic deficits at a higher level of processing, considering macrolinguistic aspects of discourse production such as cohesion, global, and local coherence [12], [13]. This study builds upon our recent work [14], in which we proposed a novel approach to automatically discriminate AD based on the analysis of topic coherence. In that work, a discourse was modeled as a graph encoding a hierarchy of topics. Then, a relatively small set of pragmatic features was extracted from this hierarchical structure and used to discriminate AD. Results showed comparable classification performance with current state of the art. In this work, we extend our previously proposed method with the introduction of two additional contributions. First, the initial set of 16 topic coherence features is broadened with 21 new measures assessing pragmatic aspects of discourse. When compared with our previous work, this extended feature set allows for a relative error reduction of almost 9% in the classification of AD. In the second contribution, this set of features is further integrated with lexical, syntactic, and semantic features. To the © © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. Citation information: DOI 10.1109/JSTSP.2020.2967879.

Pragmatic Aspects of Discourse Production for the

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 1

Pragmatic Aspects of Discourse Production for theAutomatic Identification of Alzheimer’s DiseaseAnna Pompili, Alberto Abad, David Martins de Matos, Senior Member, IEEE, and Isabel Pavao Martins

Abstract—Clinical literature provides convincing evidence thatlanguage deficits in Alzheimer’s disease (AD) allow for distin-guishing patients with dementia from healthy subjects. Currently,computational approaches have widely investigated lexicoseman-tic aspects of discourse production, while pragmatic aspects likecohesion and coherence, are still mostly unexplored. In this work,we aim at providing a more comprehensive characterizationof language abilities for the automatic identification of ADin narrative description tasks by also incorporating pragmaticaspects of speech production. To this end, we investigate therelevance of a recently proposed set of pragmatic featuresextracted from an automatically generated topic hierarchy graphin combination with a complementary set of state of the artfeatures encoding lexical, syntactic and semantic cues. Experi-mental results on the DementiaBank corpus show an accuracyimprovement from 82.6% to 85.5% in identifying AD patientswhen pragmatic features are incorporated to the set of lexicose-mantic features. Nevertheless, these results are obtained relyingon manual transcriptions, which strongly limits the applicabilityof computational analysis to clinical settings. Thus, in this workwe additionally carry out an analysis of the errors introduced bya speech recognition system and the way in which they impactthe performance of the proposed method. In spite of the highword error rates obtained on these data (~40%), automaticAD identification accuracy decreased only to 79.7%, which isconsidered a remarkable result when compared with solutionsbased on manual transcriptions.

Index Terms—Automatic speech analysis, clinical diagnosis,mental disorders, natural language processing

I. INTRODUCTION

THE prevalence of Alzheimer’s disease (AD) in 2015 was46.8 million people worldwide. Due to the increase of

average lifespan, it is expected that these data will triplicateby 2050, affecting 131.5 million people [1]. So far thereis no treatment to stop or reverse the progression of thedisease, though some may temporarily improve the clinicalcondition. AD is currently diagnosed through a review ofpatient clinical history and disability, neuropsychological testsassessing cognitive decline in different domains (memory,reasoning, language, and visuospatial abilities), brain imagingand cerebrospinal fluid exams. While the prominent symp-tom of the disease is memory impairment, existing literature

Manuscript submitted for review on May 15, 2019. This work was partiallysupported by Portuguese national funds through Fundacao para a Ciencia e aTecnologia (FCT) under grants SFRH/BD/97187/2013 and UIDB/50021/2020.(Corresponding author: Anna Pompili.)

A. Pompili, A. Abad, and D. Martins de Matos are with the INESC-ID, Instituto Superior Tecnico, 1049-001 Lisboa, Portugal. (e-mail:[email protected]; [email protected]; [email protected]).

I. Pavao Martins is with Laboratorio de Estudos de Linguagem, Faculdadede Medicina and Instituto de Medicina Molecular, 1649-004 Lisboa, Portugal.(e-mail: [email protected]).

provides evidence that discourse deficits are an importantfactor and may distinguish AD from healthy individuals [2],[3]. Impairments in language abilities are usually the resultof a decline either in the semantic or pragmatic levels oflanguage processing. Semantic processing is related to thecontent of language and involves words and their meaning.Pragmatic processing is concerned with the inappropriate useof language in social situations. It is likely that the semanticand pragmatic levels are interdependent, and that semanticdeficits in word finding may contribute to pragmatic deficitsthat lead, for example, to the problem of maintaining the topicof a conversation [4].

The aspects described above are evaluated in the macrolin-guistic dimension of discourse production by rating of thecohesion and coherence. While cohesion expresses the se-mantic relationship between elements, coherence is related tothe conceptual organization of speech, and is usually analyzedthrough the study of local, global, and topic coherence. Localcoherence refers to the conceptual links that maintain meaningbetween proximate propositions. Global coherence refers tothe way in which the discourse is organized with respect to anoverall plan. Finally, topic coherence refers to the organizationand maintenance of the topics used within the discourse.

In the literature there has been a growing interest in inves-tigating the computational analysis of language impairment inAD. Overall, existing works assess the quality of discourseproduction through the automatic analysis of a combinationof lexical, syntactic, acoustic, and semantic features [5]–[11].Very recently, some studies approached linguistic deficits at ahigher level of processing, considering macrolinguistic aspectsof discourse production such as cohesion, global, and localcoherence [12], [13].

This study builds upon our recent work [14], in whichwe proposed a novel approach to automatically discriminateAD based on the analysis of topic coherence. In that work,a discourse was modeled as a graph encoding a hierarchyof topics. Then, a relatively small set of pragmatic featureswas extracted from this hierarchical structure and used todiscriminate AD. Results showed comparable classificationperformance with current state of the art. In this work, weextend our previously proposed method with the introductionof two additional contributions. First, the initial set of 16topic coherence features is broadened with 21 new measuresassessing pragmatic aspects of discourse. When compared withour previous work, this extended feature set allows for arelative error reduction of almost 9% in the classification ofAD. In the second contribution, this set of features is furtherintegrated with lexical, syntactic, and semantic features. To the

© © 2020 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, includingreprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any

copyrighted component of this work in other works. Citation information: DOI 10.1109/JSTSP.2020.2967879.

Page 2: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 2

best of our knowledge, this is the first work that considers theselow level linguistic abilities together with a computationalanalysis of topic coherence. In this work, we investigate howthese different aspects of discourse production can contributefor an improved automatic discrimination of AD. To this end,the evaluation is performed considering three distinct groups offeatures: i) the topic coherence feature set, ii) a combination oflexical, syntactic, and semantic features, iii) a combination ofthe previous two sets. Additionally, the method in [14] dependson accurate manually produced transcriptions of the speechnarratives. This is a common requirement to many studiestargeting an automatic characterization of linguistic impair-ments in AD. However, such a requisite strongly limits theapplicability of computational approaches to clinical settings.In this work, we assess the impact of using automaticallygenerated transcriptions of the spoken narratives by a speechrecognition system. In this sense, we analyze the type of errorsintroduced and the way they impact on the performance of theproposed method.

The rest of the document is structured as follows: Section II,provides the literature review. In Sections III and IV, wepresent the dataset used in this study and a description ofthe methodology implemented to model a hierarchy of top-ics. Topic coherence and linguistic features are described inSections V-A and V-B, respectively. Then, Section VI reportson classification experiment results. Finally, conclusions aresummarized in Section VII.

II. RELATED WORK

In this section, the relevant literature review is introduced.First language impairments in AD are briefly reported, thena short review on topic coherence analysis is presented. Thesection ends with a summary of recent computational worksthat analyzed linguistic impairments to identify AD.

A. Language impairment in Alzheimer’s disease

Language processing in dementia of the Alzheimer’s typehas been the object of extensive research over the last years.The most well known symptoms of impaired language abilitiesin AD include naming [2], word-finding difficulties [15],repetitions [16], an overuse of indefinite and vague terms [17],and inappropriate use of pronouns [18]. There is strongevidence that the general cognitive dysfunction in AD un-derlies apparent syntactic, semantic and pragmatic deficits inlanguage processing. Existing studies on synctactic abilitiesprovided controversial results. Some authors agreed on thefinding that the syntactic rule system was preserved even inlater stages of the disease [15], [19]. Working and semanticmemory impairments constrained the linguistic output causingsyntactic changes. In the same way, several works confirmedthat among the hypothesis of semantic errors, visual deficitsand lexical retrieval difficulties; semantic processing was themost predominant source of errors in naming impairments [2],[20]. AD patients also showed poor performance in semanticfluency tests (e.g., list animal names) with respect to phonemicfluency (e.g., list the words beginning with a given letter),

sustaining the hypothesis of a category-specific semantic im-pairment [20], [21]. In conclusion, there is evidence supportingthe theory that there are semantic knowledge deficits behindlanguage impairments in AD.

When it comes to discourse, semantic deficits in languageprocessing constrain the production of meaningful and coher-ent speech. Discourse production is a cognitive demandingprocess: besides a large neural network dedicated to phonolog-ical, lexical, and semantic processing, it also requires memory,world knowledge, pragmatic, and executive functions. As aresult, it is not surprising that the discourse of AD patientsis described as fluent but not informative, characterized byincomplete and short sentences, poorly organized, and witha disproportionate deficit in maintaining cohesion [22], andcoherence [18], [23]. Studies assessing macrolinguistc abili-ties of language production in AD highlighted difficulties inmaintaining a coherent speech that were frequently associatedwith topic management deficits. In particular, Glosser andDeser [24] found the discourse of AD impaired on measuresof global and thematic coherence. Garcia and Joanette [25]have shown that AD patients, during the discourse, changedtopic more abruptly and had difficulties in relating new topicsto old topics. These difficulties may be explained by a failureto maintain the topic of a conversation [26].

B. Topic coherence analysis

The analysis of topic coherence was introduced in 1991in the work of Mentis and Prutting [27], whose focus wasthe study of topic introduction and management. A topic wasdescribed as a clause identifying the question of immediateconcern, while a subtopic was an elaboration or expansion ofone aspect of the main topic.

Several years later, Bradie et al. [28] analyzed topic coher-ence and topic maintenance in individuals with right hemi-sphere brain damage. This work extended the one of Mentisand Prutting [27] with the inclusion of the notion of sub-subtopic and sub-sub-subtopic. Topic and sub-divisional struc-tures were further categorized as new, related, or reintroduced.

In a later study, Mackenzie et al. [29] used discoursesamples elicited through a picture description task to determinethe influences of age, education, and gender on the conceptsand topic coherence of 225 healthy adults. Results confirmededucation level as a highly important variable affecting theperformance of healthy adults.

More recently, Miranda [30] investigated the influenceof education in the macrolinguistic dimension of discourseevaluation, considering concepts analysis, local, global andtopic coherence, and cohesion. Results corroborated the onesobtained by Mackenzie et al. [29], confirming the effect ofliteracy in this type of analysis.

C. Computational approaches for AD classification

In recent years, there has been a growing interest from theresearch community in the computational analysis of languageimpairment in AD.

Some works have focused on a combination of temporalspeech parameters and lexical measures, reaching an accuracy

Page 3: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 3

in classifying AD patients that ranges between 80% and90% [5]–[7]. Each of these studies was performed using acustom corpus specifically collected. A common noteworthyresult of these works is that, by combining acoustic and lexicalfeatures, the accuracy obtained in identifying AD patientsalways improves. On the other hand, an important limitationis concerned with the reduced amount of data used in cross-validation experiments.

Other studies have focused on lexical and syntactic featuresto discriminate AD [8], [9]. These works were based, respec-tively, on the corpus collected by Peraita and Grasso [31]and on the publicy available Carolina Conversation Corpus(CCC) [32]. In these studies, the authors explored part-of-speech (POS) features and measures of lexical richness. Clas-sification accuracy results are, respectively, of 88% and 79.5%.

More recently, semantic changes have also been investi-gated. Fraser et al. [10] evaluated the amount of informationconveyed in the DementiaBank corpus [33] by taking intoaccount a predefined list of information content units. Usinga selection of 35 features, the authors achieved a classifi-cation accuracy of 81.92% when distinguishing individualswith AD from healthy controls. Hernandez-Domınguez etal. [11] automatically evaluated the amount of informationcontent contained in the DementiaBank corpus [33] throughmeasures of informativeness and pertinence. These measuresare estimated against a referent automatically generated usingthe data from 25 healthy subjects. In the study, the authorsalso considered linguistic and phonetic features. However,the information coverage measure appeared to be the moststrongly correlated with the severity of the disease, which wasmeasured on a three point rating scale (healthy = 0, MCI= 1, and AD = 2). Results show an F-score of 81% whenidentifying patients with AD.

Few studies approached linguistic deficits at a higher levelof processing. Dos Santos et al. [12] investigated coherenceand cohesion in the task of identifying Mild Cognitive Im-pairment (MCI), a milder and earlier stage of dementia, fromhealthy subjects. The authors used three different corpora: i)the DementiaBank [33], ii) the narratives of the Cinderellastory, a corpus collected at the Medical School of the Univer-sity of Sao Paulo, and iii) samples of the Arizona Batteryfor Communication Disorders of Dementia (ABCD) [34].Discourse transcripts were modeled as a complex networkenriched with 100 dimensional word embeddings trained onWikipedia dumps. Classification was performed with measurescharacterizing the topology of the network and linguisticfeatures. Depending on the dataset used, the accuracy achievedwas of 52%, 65%, and 74%. Toledo et al. [13] analyzedmacrolinguistic aspects of speech in subjects with AD, MCI,and a healthy control group. To this end, the authors usedthe narratives of the Cinderella story. Feature extraction wasperformed with the tool Coh-Metrix-Dementia [35], which in-cludes measures of lexical diversity, syntactic complexity, ideadensity, and text coherence through latent semantic analysis.Results confirmed that AD individuals presented difficulty inplanning and organizing ideas with respect to the main topic,and produced a less informative discourse, characterized bymore repetitions without introducing new information.

TABLE ISTATISTICAL INFORMATION ON THE COOKIE THEFT CORPUS

Age range (avg.) MMSErange (avg.)

Audioduration N. of words

Controls 46-80 (63.84) 26-30 (29.06) 04h:13m 26591

AD 53-88 (71.31) 8-30 (19.36) 05h:04m 23029

Our study differs from previous works in this area inseveral ways. In fact, while the investigations reported inSections II-A, II-B are based on a human assessment of topiccoherence and management, our study relies on a computa-tional analysis of topic coherence. Additionally, we integratelexical, syntactic and semantic aspects of language productionwith measures of local, global, and topic coherence. To thebest of our knowledge, this is the first work performing thistype of computational analysis to classify AD.

III. THE COOKIE THEFT CORPUS

Data used in our work are obtained from the DementiaBankdatabase [33], which is part of the larger TalkBank project.The collection was gathered in the context of a yearly basislongitudinal study; demographic data, together with the educa-tion level, are provided. Participants included elderly controls,people with MCI, and different types of dementia, includingAD. Among other assessments, participants were required toprovide the description of the Cookie Theft picture, shownin Fig. 1. Data are in English language. Each speech samplewas recorded and then manually transcribed at word level.Narratives were segmented into utterances and annotated withdisfluencies, filled pauses, repetitions, and other more complexevents.

For the purposes of our study, only participants diagnosedwith AD were selected, resulting in 234 speech samplesfrom 147 patients. Control participants were also included,resulting in 241 speech samples from 98 speakers. Table Ireports additional information about the size of the corpus,demographic and clinical data. More details about the studycohort can be found in [33].

IV. THE PROPOSED MODEL TO ANALYZE TOPICCOHERENCE

The topics used during discourse production should containan internal, structural organization, in order to achieve aninformation hierarchy. This organizational structure allows agradual organization of information that is essential for aneffective communication [36]. Being important for both thespeaker and the listener, this type of organization highlightsthe key concepts and indicates the degrees of importance andrelevance within the discourse. Mackenzie et al. [29] in theirwork provided an example of a topic hierarchy based onthe Cookie Theft picture description task, which was laterextended in the study of Miranda [30]. An excerpt of thishierarchy is also reported in Fig. 2.

The number of relevant topics that can be described fromthe Cookie Theft picture, is limited to the concepts that are

Page 4: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 4

Fig. 1. The Cookie Theft picture, a widely used stimulus contained in theBoston Diagnostic Aphasia Examination [37].

explicitly represented in the image (e.g., garden), and to thoseones that can be implicitly suggested by the scene (e.g.,weather). Taking this into account, the problem of buildinga topic hierarchy from a transcript can be modeled with asemi-supervised approach in which a predefined set of topicsclusters is used to guide the assignment of a new topic to alevel in the hierarchy.

Both for the creation of the topics clusters, and for theanalysis of a new discourse sample, a multistage approach isfollowed to transform the original transcriptions into a repre-sentation suitable for subsequent analysis, as shown in Fig. 3.Initially, the transcriptions are preprocessed, then syntacticinformation is extracted and used to separate sentences intoclauses and to identify coreferential expressions. Finally, asentence vector representation is computed based on the wordembeddings extracted for each word in a clause. Each stageof this process is further described in the following sections.

A. Preprocessing

With the final aim of building an automated system thatrequires a minimum annotation effort (either manual or auto-matic), we decided to rely only on word level transcriptions.Consequently, all the annotations not corresponding to theplain textual representation of words were removed as a pre-processing step, such as overlapping speech, disfluencies, andcontractions, which were expanded to their canonical form.

B. Clause segmentation

The next step requires identifying dependent and inde-pendent clauses. In fact, complex, compound, or complex-compound sentences may contain references to multiple topics.The sentence /the sink is overflowing while she is wiping aplate and not looking/ is an example of this problem, as it iscomposed by an independent clause (/the sink is overflowing/ )and a dependent one (/she is wiping a plate and not looking/ ).

A possible way to cope with the separation of differentsentence types is by using syntactic parse trees. Thus, in a

Fig. 2. An excerpt of a topic hierarchy for the Cookie Theft picture foundin the work of Miranda [30].

similar way to the work of Feng et al. [38], clause and phrase-level tags are used for the identification of dependent andindependent clauses. For the former, the tag SBAR is used,while for the latter, the proposed solution checks the sequenceof nodes along the tree to verify if the tag S or the tags [NPVP] appear in the sequence [39].

C. Coreference analysis

The analysis of coreference proves to be particularly usefulin higher level NLP applications that involve language under-standing, such as in discourse analysis [40]. Strictly relatedwith the notions of anaphora and cataphora, coreference reso-lution allows to identify when two or more expressions referto the same entity in a text.

In our work, the analysis of coreference has been performedwith the Stanford coreference resolution system [41], takinginto account the segmentation performed in the previous step.

During the process of building the hierarchy, coreferenceinformation is used to condition the assignment of a subtopicto the corresponding level in the hierarchy (see Section IV-E2for further details).

For this purpose, we constrain the results provided by thecoreference system to those relationships in which the referentand the referred terms are mentioned in different clauses, andto those referred mentions that belong to the set of third-personpersonal pronouns (i.e., he, she, it, they).

D. Sentence embeddings

In the last step of the pipeline, discourse transcripts are con-verted into a representation suitable to compare and measuredifferences between sentences. In particular, the transformedtranscripts should be robust to syntactic and lexical differencesand should provide the capability to capture semantic reg-ularities between sentences. For this purpose, we rely on apre-trained model of word vector representations containing 2million word vectors, in 300 dimensions, trained with fastTexton Common Crawl [42]. In the process of converting asentence into its vector space representation, we first perform

Page 5: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 5

Topichierarchybuildingalgorithm

Topicclustersdefinition

Sentenceembedding

Coreference analysis

Clausesegmentation

Preprocessing

Fig. 3. The proposed method for modeling discourse as a hierarchy of topics.

a selection of four lexical items that were considered moreinformative for the task at hand: nouns, pronouns, verbs andadjectives. Then, for each word, we extract the correspondingword vector and finally we compute the average over the wholesentence.

E. Topic hierarchy analysis

To create a topic hierarchy from a transcript, we follow amethodology that is partly inspired by current clinical practice.Thus, in modeling the problem, we do not want to impose apredefined order or structure in the way topics and subtopicsmay be presented, as this will depend on how the discourseis organized. However, we can take advantage of the closeddomain nature of the task to define a reduced number ofclusters of broad topics that will help to guide the constructionof the hierarchy and the identification of off-topic clauses.

1) Topic clusters definition: As mentioned, the proposedsolution relies on the supervised creation of a predefinednumber of clusters of broad topics. Each cluster contains arepresentative set of sentences that are related with the topicof the cluster. 10 clusters were defined: main scene, mother,boy, girl, children, garden, weather, unrelated, incomplete, andno-content. The purpose of the cluster unrelated was to matchthose sentences in which the participant is not performingthe task (e.g., questions directed to the interviewer). Theclusters incomplete and no-content are expected to matchsentences that may potentially be characteristic of a languageimpairment. They identify fragments of text (e.g., /overflowingsink/) and expressions that do not add semantic informationabout the image (e.g., /what is going on/). To build the clusters,around 35% of the data from the control group is used. Eachsentence has been manually annotated with the correspondingcluster label and clusters are simply modeled by the completeset of sentences belonging to them. These clusters are usedas topic references for building the topic hierarchy of newtranscriptions, as described next.

2) Topic hierarchy building algorithm: The algorithm tobuild the topic hierarchy relies on the cosine similarity be-tween sentence embeddings. The first step consists of verifyingwhich is, among the 10 topic clusters defined, the one that best

matches the content of the current sentence (Fig. 4a)). Thisis achieved by computing the cosine similarity between thecurrent sentence embeddings and each sentence embeddingsin each topic clusters. The highest result determines the clusterfor the new sentence. In the following step, we need to assignthe current sentence embeddings to a level in the currenthierarchy (Fig. 4b)). This implies establishing whether we aredealing with a new or a repeated topic and its level of special-ization (i.e., subtopic, sub-subtopic, etc.). This is achieved byfirst identifying, in the current hierarchy, the subgraph whosenodes belong to the same cluster of the current sentence (e.g.,the subgraph corresponding to the mother cluster in Fig. 4).Then, we compute the cosine similarity between the currentsentence and each node of this subgraph. This process isshown in Fig. 4c). The new sentence is considered a child ofthe closest node if the similarity is higher than a threshold.Otherwise, it is considered a repeated topic (Fig. 4d)). Ifthere is no subgraph, the sentence embedding is added as anew topic. If the new topic turns out to be a coreferentialexpression, this kind of information supersedes the cosinemetric strategy, and the new topic is added directly as a childof its referent.

V. FEATURES FOR AD SPOKEN DISCOURSECHARACTERIZATION

In this section, we report the set of features used inthis work. First, pragmatic features are described. These arecomputed from the topic hierarchy and include measures oftopic, global, and local coherence. Then, a set of additionallexical, morphosyntactic, and semantic features is introduced.The complete set of features is listed in Table II.

A. Topic coherence features

From the output that is produced at each step of theprocessing pipeline (i.e., from processing steps described inSections IV-A to IV-D, and shown in Fig. 3), and from thefinal topic hierarchy, a set of 37 measurements was identifiedas of potential interest to characterize topic coherence.

In a similar way to the standard clinical evaluation, weaccounted for the number of topics, subtopics, sub-subtopicsand sub-sub-subtopics introduced, and for the total numberof repeated topics. Then, with the aim of investigating morethoroughly the subtopics produced, this work also introducesnew features that consider the number of topics producedin each topic cluster related with the Cookie Theft picture.Additionally, in the literature the mean cosine value betweenall possible pairs and adjacent pairs of sentences have beenused as measures of global and local coherence [43]. In thecurrent study, this approach has been extended to the set oftopics constituting the final hierarchy, to the set of repeatedtopics, and to those classified as unrelated or no-content. Thecomplete set of features is reported in Table II.

B. Other linguistic features

To analyze linguistic deficits in the context of narrativespeech, and, thus, providing a more comprehensive evaluation

Page 6: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 6

Currenthierarchy

themotheriswearinganapron

Notrelated

Topicclusters

Garden

zGirl

Mother

?...…

...

Mother

Mainscene

a)Topicidentification

themotheriswearingan

apron

stoolisreadyfall

boyistakingcookies

shelefttapopen

motheriswashingdishes

Mainscene

motheriswearingan

apron

stoolisreadyfall

boyistakingcookies

shelefttapopen

motheriswashingdishes

Mainscene

stoolisreadyfall

boyistakingcookies

shelefttapopen

motheriswashingdishes

Mainscene

b) Topiclevelidentification d)Topicassignmentc) Topiclevelidentification

Subgraph‘Mother’ Updatedhierarchy

Fig. 4. Topic hierarchy building algorithm. a) The current sentence is compared with the topic clusters to identify its topic. b) Identification of the levelof specialization of the current sentence. If there are no nodes with the same topic of the current sentence, this is considered a new topic. c) If the currenthierarchy contains one or more nodes with the same topic of the current sentence, each of them is analyzed with respect to the current one. d) As a result,the current sentence is added as a child of its closest node.

of language abilities, we integrate the topic coherence featureset with a number of lexical, syntactic, and semantic features.These features and the methodology used to compute them aredetailed in the remainder of this section.

1) Lexical features: An excessive use of indefinite andgeneric terms, could be analyzed through measures that aimat revealing the richness and diversity of the lexicon. For thispurpose, one of the most widely reported metric, which hasbeen used in many linguistic and clinical research studies [44],[45], is the type-token ratio (TTR). It is a sample measure ofvocabulary size, representing the ratio of the total vocabularyto the overall text length. In addition, we also computedthe Brunet’s index [46] and the Honore’s statistic [47], twoalternative measures of richness of vocabulary. Brunet’s indexquantifies lexical richness without being sensitive to textlength, while the Honore’s statistic evaluates the richness ofa lexicon by counting the number of words that occur onlyonce.

2) Morphosyntactic and syntactic features: Several studieshave found that the discourse of AD patients contains anoveruse, often inappropriate, of pronouns [3], [18]. AD pa-tients have also shown impaired verb production, verb naming,and impaired verb knowledge in sentence processing [2]. Toaccount for these and other linguistic phenomena, we com-puted the frequency of occurrence of different word classesby relying on the POS information obtained with the Stanfordparser [48]. The frequency of each class is computed at thesentence level and then normalized by the total number ofwords in a narrative. Finally, frequencies are averaged over allthe sentences.

In the same vein of word class frequencies, we accountfor the frequency of different types of production rules.In language theory, the set of production rules is used todescribe the grammar of a language. This type of analysishas already been used in problems aiming at identifying ADand related dementias [10], [49]. We account for the frequencyof occurrence of different production rules and then normalizeby the total number of rules in the narrative. Overall, the totalnumber of syntactic and morphosyntactic features is 52.

3) Semantic features: A decline in semantic content isconsistent with the claims that describe the discourse of ADpatients as empty, containing little or no information [17].

In this study, to account for information content features,we follow the approach described in the work of Croisile etal. [16], in which the authors examined 23 information contentunits (ICU) in four categories (i.e., subjects, places, objects,and actions). The features corresponding to categories subjects,objects, and places were computed by simply verifying themention of the corresponding items in the text. To compute thementioning of the ICU category actions, we examine the de-pendency representations provided by the Stanford Parser [48].

Finally, in a similar way to the work of Fraser et al. [10],we additionally complement the set of ICU features with theoccurrences of specific words that may be of relevance to theCookie Theft picture. Overall, the total number of semanticfeatures is 49.

VI. RESULTS AND DISCUSSION

AD classification experiments have been performed on asubset of the Cookie Theft corpus using a Random Forest clas-sifier. As described in Section IV, 35% of the data was held-out to define the topic clusters. Hence, only 65% of the datahas been used for experimental validation, that is, for trainingand testing AD classifiers. This consists of 148 discoursesamples from the control group, and 153 from the dementiagroup. On average, each transcribed narrative contained around12-13 sentences, with the patients group producing shorterdescriptions. A stratified k-fold cross validation per subjectstrategy was performed, with k number of folds equal to 10.In the following, we report the average and range accuracy atthe 90% confidence level computed from the results of eachfold.

In order to identify the most discriminant features forAD classification, we implemented a method based on thesequential forward selection (SFS) [50]. The SFS algorithmis an iterative search approach in which a model is trainedwith an incremental number of features. Starting with nofeatures, at each iteration the accuracy of the model is tested

Page 7: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 7

TABLE IISUMMARY OF ALL EXTRACTED FEATURES (141 IN TOTAL). THE NUMBER OF EACH TYPE OF FEATURES IS REPORTED IN PARENTHESIS.

Type Description

Topic coherence

Number of topics (T1), subtopics (T2), sub-subtopics (T3), and sub-sub-subtopics introduced (T4).Number of topics produced in each topic cluster, namely main scene (T5), mother (T6), boy (T7), girl (T8), children (T9),garden (T10), weather (T11).Proportion of dependent (T12) and independent clauses to the total number of sentences (T13).Total number of coreferential mentions (T14).Total number of repeated topics, subtopics, sub-subtopics, and sub-sub-subtopics (T15).Number of sentences that were classified as unrelated (T16), incomplete (T17), or no-content (T18) in the first step of themain algorithm.Mean, standard deviation, and coefficient of variation (the ratio of the standard deviation to the mean) of the cosinesimilarity between two temporally consecutive topics (T19-T21), all pairs of topics (T22-24), all pairs of repeated topics(T25-27), and those classified as unrelated (T28-T30) or no-content (T31-T33).Length of the longest path from the root node to all leaves (T34).Average number of outgoing edges of all nodes (T35).Total number of sentences (T36).Ratio of dependent to independent clauses (T37).

Lexical TTR (L1), Brunet’s index (L2), and Honore’s statistic (L3).

Morphosyntactic andsyntactic

Word class frequencies: adverb (M1), verbs (M2), noun (M3), pronouns (M4), and adjectives (M5).Number of times a production rule is used (M6-M41).Rate, proportion, and average length of noun (M42-M44), verb (M45-M47), and prepositional phrases (M48-M50). Ratioof nouns to verbs (M51) and of pronouns to nouns (M52).

SemanticICUs to consider the mention of a key concept in the Cookie Theft picture (S1-S23).Frequency of occurrence of specific keywords relevant for the Cookie Theft picture (S24-S49).

by adding, one at a time, each of the features that were notselected in a previous iteration. The feature that yields thebest accuracy is retained for further processing. The methodends when the addition of a new feature does not improve theperformance of the model. In this work, we implemented avariation of the SFS that explores a larger features space inorder to find better solutions to the problem at hand. That is,we removed the constraint of terminating the search as soonas a first local maximum is found, and performed an extendedsearch until the last feature was selected. Then, the modifiedapproach selects the minimal set of features that meet a certainperformance convergence criterion. In this case, this is definedby the attainment of a classification performance of at most1% worse than the global maximum.

A. Experiments using manual transcriptions

This section reports on the validation of different set offeatures computed on the manual transcriptions: experimentswith topic coherence features, with additional linguistic fea-tures, and fusion of these two sets.

1) Topic coherence features results: Using only the set offeatures computed through the multistage approach yieldingthe topic hierarchy, we obtained an average accuracy of 79.0%± 4.8% in classifying AD. This performance is achievedwith a selection of 11 features out of 37. Fig. 5 reports theclassification accuracy obtained with the SFS algorithm on thissubset of features.

From this image, it is possible to note that the number oftopics (T1) was the first feature selected, providing, alone,an average accuracy of 66%. The second and the fifth fea-tures selected were the coefficient of variation between two

temporally consecutive topics (T21) and between all pairs oftopics (T24). Nevertheless, other statistical measures relatedwith the mean and standard deviation of the cosine valuebetween adjacent (T19, T20) and all pairs of topics (T22, T23)were not considered discriminant for classification by the SFSalgorithm. These features were added in this work becausein the literature they have been used as an index of localand global coherence [12], [13], [43]. In order to understandwhy they were not considered relevant for classification, weanalyzed in more detail the average values of our measures.In this way, we note that differences between the AD andthe control group are relatively small, which may represent apossible explanation. From this analysis, we also discoveredthat, in agreement with the findings of Toledo et al. [13], theAD group achieved higher scores in the mean value of thecosine between adjacent topics, rather than between all pairsof topics. This difference has been associated with a greaterdifficulty in keeping the theme throughout the discourse.

Another relevant result regards the number of coreferentialmentions (T14) and the proportion of dependent clauses (T12).Analyzing in more detail these data, we confirm that, onaverage, there is a large difference between the two groupsfor these statistics. We also note two opposite patterns. ADpatients produced a greater number of coreferential mentionsand a reduced number of dependent clauses with respect tothe control group. These results are in agreement with thefindings that AD speech is characterized by an increased use ofpronouns and a reduced number of subordinate clauses [17]–[19].

2) Linguistic features results: Using the SFS feature selec-tion approach with the linguistic features, we found that the

Page 8: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 8

T1 T21 T16 T10 T24 T14 T12 T35 T8 T34 T30.50

0.55

0.60

0.65

0.70

0.75

0.80Ac

cura

cy

Features

Fig. 5. Variation of the classification accuracy with the SFS method, whileincreasing the number of features. Results are presented for the set of topiccoherence features that provided the maximum accuracy.

accuracy improves to 82.6% ± 5.1%. This result was achievedwith the selection of 55 features, out of 104. This outcome isconsistent with those achieved in similar works of the state ofthe art [9], [11], and in particular with the one of Fraser etal. [10]. In fact, we recall that the range of linguistic measuresanalyzed in this work can be considered a subset of thoseimplemented by Fraser et al. In this way, we are able to drawa comparison between the two approaches. In both works,the frequencies of adverbs (M1), verbs (M2), and nouns (M3)were identified in the set of the most important features. Thesame applies to the rate of prepositional phrases (M48). Ananalysis of the mean values of these features in the two groupsconfirmed a trend in agreement with the current state of the art.AD speech contains a reduced number of nouns, verbs, andprepositional phrases [16], [19], [51], [52]. For what concernssemantic features, a partial overlap was also found in the setof ICUs and word occurrences that were considered relevantfor classification.

3) Fusion of features results: When combining the topiccoherence features with the set of lexical, syntactic, andsemantic features, the accuracy improves from 82.6% ± 5.1%to 85.5% ± 2.9%. This result is achieved with the identifica-tion of a restricted number of features, only 19 out of 141,corresponding to 13% of the total number of features. This isa relevant outcome if considering that the use of these two setsindividually required the selection of 30% and 53% of the totalnumber of features, achieving a lower accuracy. As shown inFig. 6 (top), the selected subset includes 16 features assessingsyntactic, and semantic abilities, and 3 features evaluatingpragmatic aspects of language. This distribution confirms thatthe set of other linguistic features is more comprehensiveand covers a wide range of phenomena, assessing languageimpairment at different levels of processing. On the other hand,pragmatic features encode in a compact way a different type ofinformation, complementing lower level aspects of languageproduction.

B. Experiments using automatic transcriptions

The generation of manual transcriptions of discourse sam-ples is a laborious, time-consuming task that also requires

expert linguistic knowledge. This need hampers the appli-cability of the type of proposed computational analysis toclinical settings. Thus, the use of a speech recognition systemto automatically produce the transcriptions can remove thisconstraint. However, this may be at the cost of a negativeimpact on performance due to recognition errors. Nowadays,state of the art automatic speech recognition (ASR) systemscan achieve accuracy levels comparable to human transcribers(word error rate (WER) of ∼5-6%) in certain spontaneousspeech recognition tasks [53]. Nevertheless, when it comesto the recognition of atypical speech –such as speech fromelderly people– recognition errors typically get worse. Thisperformance drop is even exacerbated in the case of speechand language affecting diseases, achieving a WER around35% [54].

In this section, we describe our approach to automatictranscription generation and assess the impact of these on theAD classification task.

1) Automatic transcription generation: In this work, weused the Google Cloud Speech to Text (STT) [55] API to ob-tain the automatic transcriptions for the Cookie Theft corpus.Recordings that were originally encoded in MP3 format, wereconverted to 16 kHz sampling rate WAV audio files, a codingformat accepted by the Google API. The quality of the record-ings is quite poor, as they were originally collected on tapesin the late ’80s. Besides, some of them contain a backgroundnoise or a very low voice. However, no speech enhancementtechnique was applied. Moreover, before performing ASR,the original recording sessions were segmented in order toremove the clinician interventions, which are extraneous tothe description of the participant and need to be ignored. Forthis segmentation task, a speaker diarization system could havebeen used [56]. However, in this work we used the sentencelevel manual transcriptions to obtain the utterance boundaries.In addition to speaker changes, the corpus manual annotationswere also used to obtain manual sentence boundaries. Thisis due to the fact that the ASR system provided automaticsentence boundaries –based on long pauses– that did notcorrespond to the actual end of the sentence. In fact, acorrect identification of sentence boundaries is essential formodeling the topic hierarchy in the proposed method and forthe extraction of some of the features described in Section V.While there are natural language processing tools that performautomatic sentence segmentation, most of them require thespeech transcripts to include accurate punctuation marks, afeature that is not currently available.

2) AD classification results: Automatic transcriptions wereobtained both for the train and test data. Apart from the wayin which the transcriptions were generated, these experimentsare performed with the exact same procedure used in previousSection (VI-A3). This means that the algorithm used to buildthe hierarchy, the dataset separation, and also the partitionsused in cross validation experiments are exactly the same. Theexperiments were performed using the complete set of featuresthat includes topic coherence and other linguistic features. Inthis way, we achieved a classification accuracy of 79.7% ±3.5%. This result was obtained through the selection of 10features out of 141, corresponding to 7% of the total number of

Page 9: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 9

Fig. 6. Accuracy achieved with the top selected features using the fusion ofdifferent sets. Results are computed on the manual transcriptions (top) and onthe automatic transcriptions (bottom).

features. Table III reports a summary of the results achieved inthe task of AD classification, using both manual and automatictranscriptions.

A comparison between the set of features that providedthe best classification accuracy on the automatic transcriptionsand those identified on the manual transcriptions can be doneby observing Fig. 6. In this way, it is possible to noticethat the selected subset includes 8 features assessing lexical,syntactic, and semantic abilities, and 2 features evaluatingpragmatic aspects of language. That is, a 20% in contrast to the13% when using manual transcriptions. Interestingly, the totalnumber of coreferential mentions (T14) is the only pragmaticfeature that appear in both selections. Although the type ofinformation used in the analysis of coreference is stronglyaffected by recognition errors, this feature continues to be ofparticular relevance for the discrimination of the disease, asconfirmed by an analysis of its mean values. The number oftopics (T1) is now selected, while the number of incompletesentences (T17) and the average number of outgoing edges(T35) are no longer selected. These observations suggest onthe one hand that ASR errors affect differently some of theproposed features, and on the other hand, that a small numberof these features seem to be more insensitive to these errors(20% selected in contrast to 13%, as noted previously).

3) Transcription errors analysis: As expected, due torecognition errors, we witness a negative impact on themodel’s performance in terms of classification accuracy. TheWER –computed using the manual transcriptions of theCookie Theft corpus as ground truth– for the control and theAD groups was, respectively, of 37%, and 43%. An analysisof the audio recordings confirmed that such a high error rateis partly due to the poor quality of the recordings. In fact,those audio segments that reported a higher error rate eithercontained a lot of background noise, or the recorded voice wasof very low energy. Another possible source of error for therecognizer is related with the speaking style of the subjects,which in some cases was really fast, presenting a high rate ofcoarticulation. Nevertheless, it is worth noting the robustnessof the selected set of features and the proposed method, whichis able to achieve up to a 79.7% AD classification accuracy

TABLE IIISUMMARY OF AD CLASSIFICATION RESULTS (AVG. AND RANGE

ACCURACY % )

Manualtranscriptions

Automatictranscriptions

Topic coherence Linguistic Fusion Fusion

Accuracy 79.0±4.8 82.6±5.1 85.5±2.9 79.7±3.5

even with such an high WER.Analyzing ASR results in more detail, it was possible

to verify that deletions (i.e., words or sentences that werenot recognized) were the main source of error followed bysubstitutions (i.e., words that were incorrectly recognized).Indeed, both for the AD and the control group, the numberof deletions was almost the double of the amount of substi-tutions, contributing, respectively, to the 26% and 22% of theerror. Additionally, we analyzed for both groups the featuresobtained with the manual and the automatic transcripts. Inthis way, it was possible to observe that the majority of thefeatures computed with the automatic transcriptions showeda lower average value in comparison to the manual ones.This outcome is in agreement with the high rate of dele-tions found. Only a reduced number of features showed anopposing trend. That is, their average value was found higherwith respect to the corresponding value computed on themanual transcriptions. Among the topic coherence features,this phenomenon was observed for the number of sentencesclassified as incomplete (T17). As expected, due to deletionsits average value considerably increases, resulting however ina less discriminant feature for the AD classification task, asthe experiments with automatic transcriptions confirm. Lexicaland syntactic features also reflected the alterations existingin the transcriptions by showing an increasing trend in somemeasures. Among them, we mention the Honore’s statistic [47](L3), the frequency of nouns (M3) and other features that arerelated with this measure. Notably, an higher value in theHonore’s statistic is associated with the use of a richer vo-cabulary. This phenomenon is most likely due to substitutionserrors. The frequency of nouns (M3), the proportion of nounsto verb (M51), the rate, proportions, and average length ofnoun phrases (M42-M44) also provided an increasing trend inboth groups. A closer inspection revealed that this incrementwas indeed justified by a reduction in the number of nounphrases and by a general reduction of sentences length.

VII. CONCLUSIONS

In this work, we explored an approach to automaticallyclassify AD based on the analysis of a wide set of linguis-tic features. Pragmatic abilities of language processing areevaluated by modeling discourse samples into a hierarchy ofinformation. Through this process, a number of features relatedwith the analysis of topic coherence is computed. This set isthen complemented with various lexical, syntactic, and seman-tic features in order to provide a comprehensive evaluation oflanguage deficits in AD. Classification experiments achievedan accuracy of 85.5%, which is in line with the current

Page 10: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 10

state of the art. These results confirm that by incorporatingfeatures representing pragmatic aspects of discourse, we attaina better characterization of language impairments in AD. This,contributes to an improved ability of current computationalapproaches to provide an objective, complementary evaluation.Additionally, we evaluated the impact of using a speechrecognizer to automatically produce the transcriptions neededfor this type of computational analysis. This is an importantstep towards the applicability of these kind of approachesin a clinical setting. In this case, AD identification accuracydropped to 79.7%, mostly due to the negative effect of ASRdeletion errors. Nevertheless, this is still a remarkable diseaseclassification result considering that the WER in these datawas around 40%.

While the results presented in this study are quite promising,there is still much room for improvement in the proposedapproach that may be the subject of future research work. First,fully automatic speaker and sentence segmentation processingstages need to be incorporated to the processing pipeline.Speech recognition errors could be reduced by adapting theacoustic and language models. In particular, language modeladaptation can be of great benefit considering the closeddomain nature of the task. Second, recent works have pro-vided evidence of the importance of temporal and acousticspeech parameters [5]–[7] in the automatic identification of thedisease. A complete assessment of language abilities shouldalso include these types of characteristics. Furthermore, it isimportant to control some clinical and demographic variablesof the samples in future studies, namely participants degreeof literacy and the severity or stage of disease. The control ofthese factors will improve the diagnostic value of the results.In addition, most studies have been performed in Englishlanguage, and it is necessary to validate these results withother languages.

ACKNOWLEDGMENT

The authors want to express their gratitude to Dr. FilipaMiranda for her precious advice and help provided during thedevelopment of this work and to the anonymous reviewers thatsignificantly contributed to improve this document.

REFERENCES

[1] M. Prince, A. Wimo, M. Guerchet, G.-C. Ali, Y.-T. Wu, and P. Matthew,“World Alzheimer Report 2015 - The Global Impact of Dementia.An AnalysIs of Prevalence, Incidence, Cost and Trends,” Alzheimer’sDisease International, Tech. Rep., 2015.

[2] J. Reilly, J. Troche, and M. Grossman, “Language processing in de-mentia,” The handbook of Alzheimer’s disease and other dementias, pp.336–368, 2011.

[3] D. Kempler, “Language changes in dementia of the Alzheimer type,”Dementia and communication, pp. 98–114, 1995.

[4] S. H. Ferris and M. R. Farlow, “Language impairment in Alzheimer’sdisease and benefits of acetylcholinesterase inhibitors,” in Clinicalinterventions in aging, 2013.

[5] W. Jarrold, B. Peintner, D. Wilkins, D. Vergryi, C. Richey, M. L.Gorno-Tempini, and J. Ogar, “Aided diagnosis of dementia type throughcomputer-based analysis of spontaneous speech,” in Proc. of the Work-shop on Computational Linguistics and Clinical Psychology: FromLinguistic Signal to Clinical Reality, 2014, pp. 27–37.

[6] G. Gosztolya, V. Vincze, L. Toth, M. Pakaski, J. Kalman, and I. Hoff-mann, “Identifying Mild Cognitive Impairment and mild Alzheimer’sdisease based on spontaneous speech using ASR and linguistic features,”Computer Speech & Language, vol. 53, pp. 181–197, 2019.

[7] B. Mirheidari, D. Blackburn, T. Walker, M. Reuber, and H. Christensen,“Dementia detection using automatic analysis of conversations,” Com-puter Speech & Language, vol. 53, pp. 65–79, 2019.

[8] L. Hernandez-Domınguez, E. Garcıa-Cano, S. Ratte, and G. S. Martınez,“Detection of Alzheimer’s disease based on automatic analysis ofcommon objects descriptions,” in Proceedings of the 7th Workshop onCognitive Aspects of Computational Language Learning, 2016, pp. 10–15.

[9] C. I. Guinn and A. Habash, “Language Analysis of Speakers withDementia of the Alzheimer’s Type.” in AAAI Fall Symposium: ArtificialIntelligence for Gerontechnology, 2012, pp. 8–13.

[10] K. C. Fraser, J. A. Meltzer, and F. Rudzicz, “Linguistic Features IdentifyAlzheimer’s Disease in Narrative Speech.” J Alzheimers Dis, vol. 49,no. 2, pp. 407–422, 2016.

[11] L. Hernandez-Domınguez, S. Ratte, G. S. Martınez, and A. Roche-Bergua, “Computer-based evaluation of Alzheimer’s disease and mildcognitive impairment patients during a picture description task,”Alzheimers Dement (Amst), vol. 10, pp. 260–268, 2018.

[12] L. Santos, E. A. Correa Junior, O. Oliveira Jr, D. Amancio, L. Mansur,and S. Aluısio, “Enriching Complex Networks with Word Embeddingsfor Detecting Mild Cognitive Impairment from Speech Transcripts,”in Proceedings of the 55th Annual Meeting of the Association forComputational Linguistics (Volume 1: Long Papers), 2017, pp. 1284–1296.

[13] C. M. Toledo, S. M. Aluısio, L. B. dos Santos, S. M. D. Brucki, E. S.Tres, M. O. de Oliveira, and L. L. Mansur, “Analysis of macrolinguisticaspects of narratives from individuals with Alzheimer’s disease, mildcognitive impairment, and no cognitive impairment,” Alzheimers Dement(Amst), vol. 10, pp. 31–40, 2018.

[14] A. Pompili, A. Abad, D. Martins de Matos, and I. Pavao Martins, “Topiccoherence analysis for the classification of Alzheimer’s disease,” in Proc.IberSPEECH 2018, 2018, pp. 281–285.

[15] D. Kempler and E. Zelinski, “Language in dementia and normal aging,”Dementia and normal aging, pp. 331–365, 1994.

[16] B. Croisile, B. Ska, M.-J. Brabant, A. Duchene, Y. Lepage, G. Aimard,and M. Trillet, “Comparative study of oral and written picture descrip-tion in patients with Alzheimer’s disease,” Brain and language, vol. 53,no. 1, pp. 1–19, 1996.

[17] S. Ahmed, A.-M. F. Haigh, C. A. de Jager, and P. Garrard, “Connectedspeech as a marker of disease progression in autopsy-proven Alzheimer’sdisease,” Brain, vol. 136, no. 12, pp. 3727–3737, 2013.

[18] D. N. Ripich and B. Y. Terrell, “Patterns of discourse cohesion andcoherence in Alzheimer’s disease,” Journal of Speech and HearingDisorders, vol. 53, no. 1, pp. 8–15, 1988.

[19] S. Kemper, E. LaBarge, F. R. Ferraro, H. Cheung, H. Cheung, andM. Storandt, “On the preservation of syntax in Alzheimer’s disease:Evidence from written sentences,” Archives of neurology, vol. 50, no. 1,pp. 81–86, 1993.

[20] D. P. Salmon, N. Butters, and A. S. Chan, “The deterioration of semanticmemory in Alzheimer’s disease.” Canadian Journal of ExperimentalPsychology/Revue canadienne de psychologie experimentale, vol. 53,no. 1, p. 108, 1999.

[21] A.-L. R. Adlam, S. Bozeat, R. Arnold, P. Watson, and J. R. Hodges, “Se-mantic knowledge in mild cognitive impairment and mild Alzheimer’sdisease,” Cortex, vol. 42, no. 5, pp. 675–684, 2006.

[22] L. O. Shekim and L. L. LaPointe, “Production of discourse in individualswith Alzheimer’s Disease.” Paper presented at International Neuropsy-chological Society Meetings, Houston, TX., 1984.

[23] J. Appell, A. Kertesz, and M. Fisman, “A study of language functioningin Alzheimer patients,” Brain and language, vol. 17, no. 1, pp. 73–91,1982.

[24] G. Glosser and T. Deser, “Patterns of discourse production among neu-rological patients with fluent language disorders,” Brain and language,vol. 40, no. 1, pp. 67–88, 1991.

[25] L. J. Garcia and Y. Joanette, “Analysis of conversational topic shifts: Amultiple case study,” Brain and Language, vol. 58, no. 1, pp. 92–114,1997.

[26] M. Mentis, J. Briggs-Whittaker, and G. D. Gramigna, “Discourse topicmanagement in senile dementia of the Alzheimer’s type,” J Speech LangHear Res, vol. 38, no. 5, pp. 1054–1066, 1995.

[27] M. Mentis and C. A. Prutting, “Analysis of topic as illustrated in a head-injured and a normal adult,” J Speech Lang Hear Res, vol. 34, no. 3,pp. 583–595, 1991.

[28] M. Brady, C. Mackenzie, and L. Armstrong, “Topic use following righthemisphere brain damage during three semi-structured conversationaldiscourse samples,” Aphasiology, vol. 17, no. 9, pp. 881–904, 2003.

Page 11: Pragmatic Aspects of Discourse Production for the

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, J-STSP-AAHD-00151-2019 11

[29] C. Mackenzie, M. Brady, J. Norrie, and N. Poedjianto, “Picture descrip-tion in neurologically normal adults: Concepts and topic coherence,”Aphasiology, vol. 21, no. 3-4, pp. 340–354, 2007.

[30] F. Miranda, “Influencia da escolaridade na dimensao macrolinguıstica dodiscurso,” Master’s thesis, Universidade Catolica Portuguesa, Instituto deciencias da saude, 2015.

[31] P. Herminia and G. Lina, “Corpus linguıstico de definiciones de cate-gorıas semanticas de personas mayores sanas y con la enfermedad delAlzheimer,” Fundacion BBVA, Tech. Rep., 2010.

[32] C. Pope and B. H. Davis, “Finding a balance: The carolinas conversationcollection,” Corpus Linguistics and Linguistic Theory, vol. 7, no. 1, pp.143–161, 2011.

[33] J. T. Becker, F. Boiler, O. L. Lopez, J. Saxton, and K. L. McGonigle,“The natural history of Alzheimer’s disease: description of study cohortand accuracy of diagnosis,” Archives of Neurology, vol. 51, no. 6, pp.585–594, 1994.

[34] K. A. Bayles and C. K. Tomoeda, Arizona battery for communicationdisorders of dementia. Canyonlands Publishing Tucson, AZ, 1993.

[35] S. Aluısio, A. Cunha, and C. Scarton, “Evaluating Progression ofAlzheimer’s Disease by Regression and Classification Methods in aNarrative Language Test in Portuguese,” in International Conferenceon Computational Processing of the Portuguese Language. Springer,2016, pp. 109–114.

[36] H. Ulatowska and S. Chapman, Discourse Analysis and Applications.Studies In Adult Clinical Populations. Hillsdale: Lawrence ElbaumAssociates, 1994, ch. Discourse macrostructure in aphasia, pp. 29–46.

[37] H. Goodglass, E. Kaplan, and B. Barresi, The Boston Diagnostic AphasiaExamination, Baltimore: Lippincott, Williams & Wilkins, 2001.

[38] S. Feng, R. Banerjee, and Y. Choi, “Characterizing stylistic elementsin syntactic structure,” in Proceedings of the 2012 Joint Conference onEmpirical Methods in Natural Language Processing and ComputationalNatural Language Learning. Association for Computational Linguis-tics, 2012, pp. 1522–1533.

[39] “Penn Treebank Constituent Tags,” (Accessed on 15-January-2019).[Online]. Available: https://tinyurl.com/y3qk6fnk

[40] S. Boytcheva, P. Dobrev, and G. Angelova, “Cgextract: Towards Extrac-tion of Conceptual Graphs from Controlled English.” Contributionsto ICCS-2001, 9th International Conference of Conceptual Structures,2001.

[41] C. D. Manning, M. Surdeanu, J. Bauer, J. Finkel, S. J. Bethard, andD. McClosky, “The Stanford CoreNLP natural language processingtoolkit,” in Association for Computational Linguistics (ACL) SystemDemonstrations, 2014, pp. 55–60.

[42] T. Mikolov, E. Grave, P. Bojanowski, C. Puhrsch, and A. Joulin,“Advances in Pre-Training Distributed Word Representations,” in Pro-ceedings of the International Conference on Language Resources andEvaluation (LREC 2018), 2018.

[43] A. C. Graesser, D. S. McNamara, M. M. Louwerse, and Z. Cai, “Coh-Metrix: Analysis of text on cohesion and language,” Behavior researchmethods, instruments, & computers, vol. 36, no. 2, pp. 193–202, 2004.

[44] R. S. Bucks, S. Singh, J. M. Cuerden, and G. K. Wilcock, “Analysisof spontaneous, conversational speech in dementia of Alzheimer type:Evaluation of an objective technique for analysing lexical performance,”Aphasiology, vol. 14, no. 1, pp. 71–91, 2000.

[45] K. Kettunen, “Can type-token ratio be used to show morphologicalcomplexity of languages?” Journal of Quantitative Linguistics, vol. 21,no. 3, pp. 223–245, 2014.

[46] E. Brunet, Le vocabulaire de Jean Giraudoux. Structure et evolution.Slatkine, 1978, no. 1.

[47] A. Honore, Some Simple Measures of Richness of Vocabulary. Asso-ciation of Literary and Linguistic Computing Bulletin, 1978, no. 7.

[48] D. Klein and C. D. Manning, “Accurate unlexicalized parsing,” in Pro-ceedings of the 41st Annual Meeting on Association for ComputationalLinguistics-Volume 1. Association for Computational Linguistics, 2003,pp. 423–430.

[49] S. O. Orimaye, J. S.-M. Wong, and K. J. Golden, “Learning predictivelinguistic features for Alzheimer’s disease and related dementias usingverbal utterances,” in Proceedings of the Workshop on ComputationalLinguistics and Clinical Psychology: From Linguistic Signal to ClinicalReality, 2014, pp. 78–87.

[50] P. Pudil, J. Novovicova, and J. Kittler, “Floating search methods infeature selection,” Pattern recognition letters, vol. 15, no. 11, pp. 1119–1125, 1994.

[51] G. Kave and M. Goral, “Word retrieval in picture descriptions producedby individuals with Alzheimer’s disease,” Journal of clinical and exper-imental neuropsychology, vol. 38, no. 9, pp. 958–966, 2016.

[52] S. Ahmed, C. A. de Jager, A.-M. Haigh, and P. Garrard, “Semanticprocessing in connected speech at a uniformly early stage of autopsy-confirmed Alzheimer’s disease.” Neuropsychology, vol. 27, no. 1, p. 79,2013.

[53] A. Stolcke and J. Droppo, “Comparing Human and Machine Errors inConversational Speech Transcription,” in Proc. Interspeech, 2017, pp.137–141.

[54] M. Lehr, E. Prud’hommeaux, I. Shafran, and B. Roark, “Fully Auto-mated Neuropsychological Assessment for Detecting Mild CognitiveImpairment,” in Thirteenth Annual Conference of the InternationalSpeech Communication Association, 2012.

[55] “Google cloud speech-to-text api,” (Accessed on 15-January-2019).[Online]. Available: https://cloud.google.com/speech-to-text/

[56] S. Meignier and T. Merlin, “LIUM SpkDiarization: an open sourcetoolkit for diarization,” in CMU SPUD Workshop, 2010.

Anna Pompili graduated in Information Systemsand Computer Engineering (2011) from Tor Vergata,Rome. She received a Masters Degree in Infor-mation Systems and Computer Engineering (2013)from Instituto Superior Tecnico, Lisbon. Her currentresearch interests focus on speech and languagetechnologies applied to the diagnosis and therapy ofbrain diseases.

Alberto Abad received the Telecommunication En-gineering degree from the Technical University ofCatalonia (UPC), Barcelona, Spain, in 2002 andthe Ph.D. degree from UPC, in 2007. His currentresearch interests include robust speech recognition,speaker and language identification, computationalacoustic scene analysis, privacy-preserving speechprocessing, and health-care applications. Dr. Abadis a member of the International Speech Communi-cation Association (ISCA).

David Martins de Matos graduated in Electricaland Computer Engineering (1990) from InstitutoSuperior Tecnico (IST), Lisbon. He received a Mas-ters Degree in Electrical and Computer Engineering(1995) (IST). He received a Doctor of EngineeringDegree in Systems and Computer Science (2005)(IST). His current research interests focus on com-putational music processing, automatic summariza-tion and natural language generation, human-robotinteraction, and natural language semantics.

Isabel Pavao Martins is a neurologist who com-pleted her medical training in Neurology at theHospital de Sta Maria in Lisbon and also at theNational Hospital for Nervous Diseases in London.She is currently Associate Professor of Neurologyat the Faculty of Medicine, University of Lisbon,Investigator at the Instituto de Medicina Molecularand Consultant Neurologist at the Hospital de StaMaria. She was President of the Portuguese So-ciety of Neurology (2008-2010) and President ofthe Pedagogic Council of the Lisbon Faculty of

Medicine (2015-2017). Her main research interests are Behavioral Neurologyand Headache and she has 150 scientific publications.