Untitled
Original Pape
A Framework for Applying Natural Language Processing in Digital
Health Interventions
Burkhardt Funk1*, PhD; Shiri Sadeh-Sharvit2,3*, PhD; Ellen E Fitzsimmons-Craft4, PhD; Mickey Todd Trockel3, PhD;
Grace E Monterubio4, MA; Neha J Goel2,3, MA; Katherine N Balantekin4,5, PhD; Dawn M Eichen4,6, PhD; Rachael
E Flatt2,3,7, BSc; Marie-Laure Firebaugh4, LMSW; Corinna Jacobi8, PhD; Andrea K Graham9, PhD; Mark
Hoogendoorn10, PhD; Denise E Wilfley4, PhD; C Ba
Taylor2,3, MD
1Leuphana University, Institute of Information Systems, Lueneburg, Germany
2Palo Alto University, Center for m2Health, Palo Alto, CA, United States
3Stanford University, Department of Psychiatry and Behavioral Sciences, Stanford, CA, United States
4Washington University in St Louis, Department of Psychiatry, St Louis, MO, United States
5University at Buffalo, Department of Exercise and Nutrition Sciences, Buffalo, NY, United States
6University of California San Diego, Department of Pediatrics, San Diego, CA, United States
7University of North Carolina at Chapel Hill, Department of Psychology and Neurosciences, Chapel Hill, NC, United States
8Technische Universität, Institute of Clinical Psychology and Psychotherapy, Dresden, Germany
9Northwestern University, Department of Medical Social Sciences, Chicago, IL, United States
10Vrije Universiteit, Department of Computer Science, Amsterdam, Netherlands
*these authors contributed equally
Co
esponding Author:
Burkhardt Funk, PhD
Leuphana University
Institute of Information Systems
Universitaetsallee 1
Lueneburg, 21335
Germany
Phone: XXXXXXXXXX
Email: XXXXXXXXXX
Abstract
Background: Digital health interventions (DHIs) are poised to reduce target symptoms in a scalable, affordable, and empirically
supported way. DHIs that involve coaching or clinical support often collect text data from 2 sources: (1) open co
espondence
etween users and the trained practitioners supporting them through a messaging system and (2) text data recorded during the
intervention by users, such as diary entries. Natural language processing (NLP) offers methods for analyzing text, augmenting
the understanding of intervention effects, and informing therapeutic decision making.
Objective: This study aimed to present a technical framework that supports the automated analysis of both types of text data
often present in DHIs. This framework generates text features and helps to build statistical models to predict target variables,
including user engagement, symptom change, and therapeutic outcomes.
Methods: We first discussed various NLP techniques and demonstrated how they are implemented in the presented framework.
We then applied the framework in a case study of the Healthy Body Image Program, a Web-based intervention trial for eating
disorders (EDs). A total of 372 participants who screened positive for an ED received a DHI aimed at reducing ED psychopathology
(including binge eating and purging behaviors) and improving body image. These users generated 37,228 intervention text snippets
and exchanged 4285 user-coach messages, which were analyzed using the proposed model.
Results: We applied the framework to predict binge eating behavior, resulting in an area under the curve between 0.57 (when
applied to new users) and 0.72 (when applied to new symptom reports of known users). In addition, initial evidence indicated
that specific text features predicted the therapeutic outcome of reducing ED symptoms.
J Med Internet Res 2020 | vol. 22 | iss. 2 | e13855 | p. 1https:
www.jmir.org/2020/2/e13855
(page number not for citation purposes)
Funk et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
mailto: XXXXXXXXXX
http:
www.w3.org/Style/XSL
http:
www.renderx.com
Conclusions: The case study demonstrates the usefulness of a structured approach to text data analytics. NLP techniques improve
the prediction of symptom changes in DHIs. We present a technical framework that can be easily applied in other clinical trials
and clinical presentations and encourage other groups to apply the framework in similar contexts.
(J Med Internet Res 2020;22(2):e13855) doi: XXXXXXXXXX/13855
KEYWORDS
Digital Health Interventions Text Analytics (DHITA); digital health interventions; eating disorders; guided self-help; natural
language processing; text mining
Introduction
Digitally delivered interventions for mental disorders have the
potential to reduce the mental health burden worldwide [1].
Efficacious online and mobile phone app–based programs can
overcome ba
iers to treatment such as stigma, reach, access,
cost, and the scarcity of professionals trained in empirically
supported interventions [2]. Furthermore, digital health
interventions (DHI) are more scalable, potentially allowing one
professional to manage a large number of individuals [3]. As
DHIs are increasingly used, new data analytics capabilities are
needed to evaluate treatment outcomes and mechanisms of
engagement and symptom reduction [4].
Most DHIs collect structured data that are pertinent to assessing
adherence to the intervention and symptom change over time,
including symptom severity scales, number of sessions
completed, and number of times the program was accessed [5].
Digital guided self-help interventions, a type of DHI, also
incorporate a trained practitioner (coach) who facilitates the
user’s learning of the intervention material, monitors progress,
and helps troubleshoot ba
iers to change. This allows for the
collection of rich, in-depth text data that could augment the
understanding of intervention efficacy and inform the
development and refinement of future programs. Such datasets
include texts generated through direct communication between
users and their facilitators through a digital platform. Anothe
source of information comes from text users’ record during the
intervention, for example, free-text diary entries and posts
authored on intervention-related group chats and discussion
oards [6]. Data analytic approaches, therefore, could benefit
from cultivating an overarching perspective on methods to apply
for studying the text data emerging from technology-delivered
programs.
Hereafter, we provide a
ief review of the use of text analytics
methods in DHIs. Then, we propose a framework for applying
natural language processing (NLP) in this field and demonstrate
its application in a test case of an online intervention for eating
disorders (EDs), delivered as part of the Healthy Body Image
(HBI) Program trial [7].
Methods
Natural Language Processing in Mental Health
Interventions
NLP is a rapidly evolving interdisciplinary field that studies
human language content and its use in predicting human
ehavior [8]. NLP models utilize computational models to
analyze unstructured, user-generated text to identify patterns
and related outcomes (eg, a change in target symptoms) [9]. If
proven effective, NLP models may ultimately enable the design
of automated chatbots in person-machine communication [10].
Although the use of NLP in consumer and online search
ehavior is well established [11], it has only recently been
utilized in mental health research [12].
Text data analytics can inform clinical decisions, particularly
when professionals have many data points at their disposal, but
each characteristic has weak predictive potency [13]. Using
NLP models, researchers have evidenced, for instance, that text
communications can predict an increase in psychiatric symptoms
[14], that text data on electronic medical records can effectively
predict treatment outcomes [5], and that patients’ reviews of
the care they receive can provide important insights fo
stakeholders [15]. Furthermore, when analyzing text data,
machine learning algorithms demonstrated greater accuracy
than mental health professionals in distinguishing between
suicide notes written by suicide completers and controls [16].
A similar approach has also been utilized in understanding
medical risks through NLP of electronic medical records [17].
NLP strategies have also been applied to analyze text data from
social media in the context of mental health. For instance,
Coppersmith et al [18] detected quantifiable signals of mental
disorders through analyses of text data available on Twitter.
NLP is also effective in using text messages exchanged with a
crisis intervention service to predict outcomes [8].
Computational discourse analysis methods have been employed
to develop insights on what constitutes effective counseling text
conversations as well [19]. Similarly, by analyzing patterns of
the words, sentiments, topics, and style of messages used,
Hoogendoorn et al [12] found a co
elation between several text
features and social anxiety in an online treatment. However,
esearch on the clinical applicability of NLP models is still in
its early stages [10]. For example, Miner et al [20] have shown
that cu
ently available smartphone-based conversational agents
(eg, Apple’s Siri), which many individuals use to search health
information [21], are not equipped to respond effectively to
users’ inquiries about mental health. Considering the potential
of text data to inform and enrich both clinicians and clients, the
development and refinement of NLP tools should be a significant
public health priority.
Proposed Framework
NLP offers a useful set of tools for analyzing text data generated
in DHIs and for building predictive models. NLP can clarify
the mechanisms mediating the effects of online interventions
as well as improve and personalize DHIs, leading ultimately to
further automation of technology-delivered programs and lowe
J Med Internet Res 2020 | vol. 22 | iss. 2 | e13855 | p. 2https:
www.jmir.org/2020/2/e13855
(page number not for citation purposes)
Funk et alJOURNAL OF MEDICAL INTERNET RESEARCH
XSL•FO
RenderX
http:
dx.doi.org/10.2196/13855
http:
www.w3.org/Style/XSL
http:
www.renderx.com
costs [22]. DHI’s free text may be created by 2 sources. First,
information about users’ thoughts, emotions, and behaviors is
collected via open-ended questions embedded within the
program (eg, “Hey [user], after learning about triggers, can you
identify two of your common triggers for binge eating?”).
Employing NLP techniques to this type of text data can be used
to build predictive models, for instance, for calculating
individual mood symptoms and symptom trajectories [23].
Second, in guided self-help interventions, users and coaches
exchange messages for problem solving, engaging users,
providing supplemental information, and individualizing the
intervention.
In DHIs, each text snippet, that is, a free-text segment, is
associated with a specific user and has a unique time stamp.
Figure 1 represents an exemplified user journey and shows the
time interval a user spends within a DHI. Each filled symbol
on the timeline represents a text snippet where the shape and
color reflect the text classes (eg, a message from a user). Text
snippets are not the only elements of user’s journeys; instead,
structured touchpoints (indicated by open circles in Figure 1)
complete the data associated with specific users. A touchpoint
is,
oadly speaking, an interaction of the user with the DHI.
Besides text messages exchanged between users and coaches,
this includes symptom severity scales.
Figure 1. Text fragments along an exemplified user journey of a specific user i (vertical dots refer to other users); open circles refer to other nontext
touchpoints and the interaction of the user with the digital health intervention; upward pointing triangles refer to fragments from diaries; red squares
efer to the messages sent by coaches; black squares refer to the messages sent by users; and downward pointing triangles refer to the data collected
within specific exercises (eg, deep
eathing).
The analysis of texts in DHIs encompasses 2 steps (Figure 2).
The first step, feature engineering, concentrates on preprocessing
the text data to identify structured features (free texts cannot be
directly used by machine learning algorithms). These features
form a numerical vector of typically fixed length that represents
each snippet and can be used to estimate statistical models. In
the second step, predictive modeling, models are constructed
to infer and predict either short-term symptom change or overall
therapeutic outcomes. Information