Drift-diffusion modeldiffusion是什么意思思

CitationSee all >10 References31.22 · Université de Rennes 1AbstractThis article is devoted to the discretization and numerical simulation of a new quan-tum drift-diffusion model that was recently derived. We define an implicit numerical scheme which is equivalent to a convex minimization problem and which preserves the physical properties of the continuous model: charge conservation, positivity of the density and dissipation of an entropy. We illustrate these results by some nu-merical simulations.Discover the world's research11+ million members100+ million publications100k+ research projects
ABSTRACT: Ce travail de thèse comporte trois parties. La partie principale s'intéresse au transport des courants polarisés en spin dans des matériaux à base de semi-conducteurs. Nous dérivons et analysons une hiérarchie des modèles allant du niveau microscopique au niveau macroscopique et tenant compte des différents mécanismes de rotation et de relaxation du vecteur spin dans les semi-conducteurs. Les mécanismes essentiels pris en compte sont les couplages spin-orbite et les interactions avec renversement de spin (spin-flip interactions). Une analyse semi-classique (via la transformation de Wigner) de l'équation de Schr?dinger avec hamiltonien spin-orbite est présentée. Au niveau cinétique, l'équation de Vlasov (ou Boltzmann) spinorielle est une équation à valeur dans l'ensemble des matrices carrées d'ordre deux hermitiennes et positives. Partant ensuite de la spinor forme de l'équation de Boltzmann (avec différents opérateurs de collisions avec et sans renversement du vecteur spin) et par des techniques d'asymptotiques de diffusion, nous dérivons et analysons plusieurs modèles macroscopiques. Ils sont de type dérive-diffusion, SHE, Energie-Transport, à deux composantes ou spinoriels conservant des effets de rotation et de relaxation du vecteur spin. Nous validons ensuite ces modèles par des cas tests numériques. Deux applications numériques sont présentées : la simulation d'un transistor à effet de rotation de spin et l'étude de l'effet d'accumulation de spin à l'interface entre deux couches semi-conductrices différemment dopées. Dans la seconde partie, nous considérons une équation cinétique de type Boltzmann linéaire dans des domaines où un champ magnétique fort est appliqué. Nous étudions la limite de diffusion en supposant que le champ magnétique est unidirectionnel et tend vers l'infini. Le modèle obtenu est un modèle macroscopique constitué d'une équation diffusive dans la direction parallèle au champ magnétique et d'une dérive représentant l'effet centre-guide en présence d'un champ électrique dans la direction perpendiculaire. Le terme de diffusion contient des moyennes de giration de l'opérateur de collisions utilisé. Nous prouvons la convergence en utilisant des techniques d'entropie pour traiter le comportement diffusif, et en conjuguant par les rotations locales induites par le champ magnétique pour tenir compte des oscillations. Dans la troisième partie de cette thèse, Nous nous intéressons à la description du potentiel de confinement dans des gas d'électrons bidimensionnels. Nous étudions la limite faible longueur de Debye (ou faible température) du système de Schr?dinger-Poisson unidimensionnel stationnaire sur un intervalle borné. Les électrons sont supposés dans un mélange d'états avec une statistique de Boltzmann (ou de Fermi-Dirac). En utilisant différentes reformulations du système comme des problèmes de minimisation convexe, nous montrons qu'asymptotiquement seul le premier niveau d'énergie est occupé. Le potentiel électrostatique converge vers une couche limite avec un profil calculé à l'aide d'un système de Schr?dinger-Poisson sur le demi axe réel.Article ·
ArticleFebruary 2017 · Comptes Rendus Mathematique · Impact Factor: 0.47ArticleFebruary 2017 · Journal of Computational Physics · Impact Factor: 2.43ArticleFebruary 2017 · SIAM Journal on Numerical Analysis · Impact Factor: 1.79ArticleFebruary 2017 · Journal of Computational Electronics · Impact Factor: 1.52Data provided are for informational purposes only. Although carefully collected, accuracy cannot be guaranteed. Publisher conditions are provided by RoMEO. Differing provisions from the publisher's actual policy or licence agreement may be applicable.This publication is from a journal that may support self archiving.
oror log in withHDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python & HDDM 0.5.2.beta documentation
HDDM: Hierarchical Bayesian estimation of the Drift-Diffusion Model in Python
Thomas V. Wiecki, Imri Sofer, Michael J. Frank
Correspondence:
The diffusion model is a commonly used tool to infer latent
psychological processes underlying decision making, and to link them
to neural mechanisms. Although efficient open source software has been
made available to quantitatively fit the model to data, current
estimation methods require an abundance of reaction time measurements
to recover meaningful parameters, and only provide point estimates of
each parameter.
In contrast, hierarchical Bayesian parameter
estimation methods are useful for enhancing statistical power,
allowing for simultaneous estimation of individual subject parameters
and the group distribution that they are drawn from, while also
providing measures of uncertainty in these parameters in the posterior
distribution. Here, we present a novel Python-based toolbox called
HDDM (hierarchical drift diffusion model), which allows fast and
flexible estimation of the the drift-diffusion model and the related
linear ballistic accumulator model. HDDM requires less data per
subject / condition than non-hierarchical method, allows for full
Bayesian data analysis, and can handle outliers in the data.
HDDM supports the estimation of how trial-by-trial measurements
(e.g. fMRI) influence decision making parameters. This paper will
first describe the theoretical background of drift-diffusion model and
Bayesian inference. We then illustrate usage of the toolbox on a
real-world data set from our lab. The software and documentation can
be downloaded at:
Introduction
Sequential sampling models (SSMs) () have
established themselves as the de-facto standard for modeling
reaction-time data from simple two-alternative forced choice decision
making tasks (). Each decision is modeled as an
accumulation of noisy information indicative of one choice or the
other, with sequential evaluation of the accumulated evidence at each
time step. Once this evidence crosses a threshold, the corresponding
response is executed. This simple assumption about the underlying
psychological process has the appealing property of reproducing not
only choice probabilities, but the full distribution of response times
for each of the two choices. Models of this class have been used
successfully in mathematical psychology since the 60s and more
recently adopted in cognitive neuroscience investigations. These
studies are typically interested in neural mechanisms associated with
the accumulation process or for regulating the decision threshold (see
). One issue in such model-based
cognitive neuroscience approaches is that the trial numbers in each
condition are often low, making it difficult it difficult to estimate
model parameters. For example, studies with patient populations,
especially if combined with intraoperative recordings, typically have
substantial constraints on the duration of the task. Similarly,
model-based fMRI or EEG studies are often interested not in static
model parameters, but how these dynamically vary with trial-by-trial
variations in recorded brain activity. Efficient and reliable
estimation methods that take advantage of the full statistical
structure available in the data across subjects and conditions are
critical to the success of these endeavors.
Bayesian data analytic methods are quickly gaining popularity in the
cognitive sciences because of their many desirable properties
(, ). First, Bayesian methods
allow inference of the full posterior distribution of each parameter,
thus quantifying uncertainty in their estimation, rather
than simply provide their most likely value. Second, hierarchical modeling is
naturally formulated in a Bayesian framework. Traditionally,
psychological models either assume subjects are completely independent
of each other, fitting models separately to each individual, or that
all subjects are the same, fitting models to the group as if they
are all copies of some “average subject”. Both approaches are sub-optimal in
that the former fails to capitalize on statistic strength offered by
the degree to which subjects are similar in one or more model
parameters, whereas the latter approach fails to account for the
differences among subjects, and hence could lead to a situation where
the estimated model cannot fit any individual subject. The same limitations
apply to current DDM software packages such as
. Hierarchical Bayesian methods provide a remedy for
this problem by allowing group and subject parameters to be estimated
simultaneously at different hierarchical levels
(, , ). Subject parameters are
assumed to be drawn from a group distribution, and to the degree that
subject are similar to each other, the variance in the group
distribution will be estimated to be small, which reciprocally has a
greater influence on constraining parameter estimates of any
individual. Even in this scenario, the method still allows the
posterior for any given individual subject to differ substantially
from that of the rest of the group given sufficient data to overwhelm
the group prior. Thus the method capitalizes on statistical strength
shared across the individuals, and can do so to different degrees even
within the same sample and model, depending on the extent to which
subjects are similar to each other in one parameter vs. another. In
the DDM for example, it may be the case that there is relatively
little variability across subjects in the perceptual time for stimulus
encoding, quantified by the “non-decision time” but more variability
in their degree of response caution, quantified by the “decision
threshold”. The estimation should be able to capitalize on this
structure so that the non-decision time in any given subject is
anchored by that of the group, potentially allowing for more efficient
estimation of that subjects decision threshold. This approach may be
particularly helpful when relatively few trials per condition are
available for each subject, and when incorporating noisy
trial-by-trial neural data into the estimation of DDM parameters.
is an open-source software package written in
allows (i) the flexible construction of hierarchical Bayesian drift
diffusion models and (ii) the estimation of its posterior parameter
distributions via
(). User-defined
models can be created via a simple python script or be used
interactively via, for example,
interpreter shell (:cite:PER-GRA2007). All
run-time critical functions are coded in
() and compiled natively for speed
which allows estimation of complex models in minutes. HDDM includes
many commonly used statistics and plotting functionality generally
used to assess model fit. The code is released under the permissive
BSD 3-clause license, test-covered to assure correct behavior and well
documented. Finally, HDDM allows flexible estimation of trial-by-trial
regressions where an external measurement (e.g. brain activity as
measured by fMRI) is correlated with one or more decision making
parameters.
With HDDM we aim to provide a user-friendly but powerful tool that can
be used by experimentalists to construct and fit complex,
user-specified models using state-of-the-art estimation methods to
test their hypotheses. The purpose of this report is to introduce the
toolbox and provide a tutorial
subsequent
reports will quantitatively characterize its success in recovering
model parameters and advantages relative to non-hierarchical or
non-Bayesian methods as a function of the number of subjects and
trials (:cite: SoferWieckiFrank).
Sequential Sampling Models
SSMs generally fall into one of two classes: (i) diffusion models
which assume that relative evidence is accumulated over time
and (ii) race models which assume independent evidence accumulation
and response commitment once the first accumulator crossed a boundary
(, ). Currently, HDDM includes two of the most
commonly used SSMs: the drift diffusion model (DDM)
(, ) belonging to the
class of diffusion models and the linear ballistic accumulator (LBA)
() belonging to the class of race models.
Drift Diffusion Model
The DDM models decision making in two-choice tasks. Each choice is
represented as an upper and lower boundary. A drift-process
accumulates evidence over time until it crosses one of the two
boundaries and initiates the corresponding response
(, ). The speed with
which the accumulation process approaches one of the two boundaries is
called drift-rate v and represents the relative evidence for or
against a particular response. Because there is noise in the drift
process, the time of the boundary crossing and the selected response
will vary between trials. The distance between the two boundaries
(i.e. threshold a) influences how much evidence must be accumulated
until a response is executed. A lower threshold makes responding
faster in general but increases the influence of noise on decision
making and can hence lead to errors or impulsive choice, whereas a
higher threshold leads to more cautious responding (slower, more
skewed RT distributions, but more accurate). Response time, however,
is not solely comprised of the decision making process – perception,
movement initiation and execution all take time and are lumped in the
DDM by a single non-decision time parameter t. The model also allows
for a prepotent bias z affecting the starting point of the drift
process relative to the two boundaries. The termination times of this
generative process gives rise to the reaction time distributions of
both choices.
Trajectories of multiple drift-process (blue and red lines,
middle panel). Evidence is accumulated over time (x-axis) with
drift-rate v until one of two boundaries (separated by
threshold a) is crossed and a response is initiated. Upper (blue)
and lower (red) panels contain histograms over
boundary-crossing-times for two possible responses. The histogram
shapes match closely to that observed in reaction time
measurements of research participants.
An analytic solution to the resulting probability distribution of
the termination times was provided by :
\[f(t|v, a, z) = \frac{\pi}{a^2} \, \text{exp} \left( -vaz-\frac{v^2\,t}{2} \right) \times \sum_{k=1}^{\infty} k\, \text{exp} \left( -\frac{k^2\pi^2 t}{2a^2} \right) \text{sin}\left(k\pi z\right)\]
Since the formula contains an infinite sum, HDDM uses an approximation
provided by .
Later on, the DDM was extended to include additional noise parameters
capturing inter-trial variability in the drift-rate, the non-decision
time and the starting point in order to account for two phenomena
observed in decision making tasks, most notably cases where errors are
faster or slower than correct responses. Models that take this into
account are referred to as the full DDM
(). HDDM uses analytic integration of the
likelihood function for variability in drift-rate and numerical
integration for variability in non-decision time and bias. More
information on the model specifics can be found in :cite: SoferWieckiFrank.
Linear Ballistic Accumulator
The Linear Ballistic Accumulator (LBA) model belongs to the class of
race models (). Instead of one drift process
and two boundaries, the LBA contains one drift process for each
possible response with a single boundary each. Thus, the LBA can model
decision making when more than two responses are possible. Moreover,
unlike the DDM, the LBA drift process has no intra-trial variance. RT
variability is obtained by including inter-trial variability in the
drift-rate and the starting point distribution. Note that the
simplifying assumption of a noiseless drift-process simplifies the
math significantly leading to a computationally more efficient
likelihood function for this model.
In a simulation study it was shown that the LBA and DDM lead to
similar results as to which parameters are affected by certain
manipulations ().
Two linear ballistic accumulators (left and right) with different
noiseless drifts (arrows) sampled from a normal distribution
initiated at different starting points sampled from a uniform
distribution. In this case, the accumulator for response
alternative 1 is more likely to reach the criterion first, and
therefore gets selected more often. Because of this race between
two accumulators towards a common threshold these model are called
race-models. Reproduced from .
Hierarchical Bayesian Estimation
Statistics and machine learning have developed efficient and versatile
Bayesian methods to solve various inference problems
. More recently, they have seen wider adoption in
applied fields such as genetics
psychology . One reason for this
Bayesian revolution is the ability to quantify the certainty one has
in a particular estimation. Moreover, hierarchical Bayesian models
provide an elegant solution to the problem of estimating parameters of
individual subjects and groups of subjects, as outlined above. Under the assumption that
participants within each group are similar to each other, but not
identical, a hierarchical model can be constructed where individual
parameter estimates are constrained by group-level distributions
Bayesian methods require specification of a generative process in form
of a likelihood function that produced the observed data \(x\)
given some parameters \(\theta\). By specifying our prior beliefs
(which can be informed or non-informed) we can use Bayes formula to
invert the generative model and make inference on the probability of
parameters \(\theta\):
\[P(\theta|x) = \frac{P(x|\theta) \times P(\theta)}{P(x)}\]
Where \(P(x|\theta)\) is the likelihood of observing the data (in
this case choices and RTs) given each parameter value and
\(P(\theta)\) is the prior probability of the parameters. In most
cases the computation of the denominator is quite complicated and
requires to compute an analytically intractable integral. Sampling
methods like Markov-Chain Monte Carlo (MCMC)
circumvent this problem by providing a way to produce samples from the
posterior distribution. These methods have been used with great
success in many different scenarios
and will be discussed in more detail below.
As noted above, the Bayesian method lends itself naturally to a
hierarchical design. In such a design, parameters for one distribution
can themselves be drawn from a higher level distribution. This
hierarchical property has a particular benefit to cognitive modeling
where data is often scarce. We can construct a hierarchical model to
more adequately capture the likely similarity structure of our
data. As above, observed data points of each subject \(x_{i,j}\)
(where \(i = 1, \dots, S_j\) data points per subject and \(j
= 1, \dots, N\) for \(N\) subjects) are distributed according to
some likelihood function \(f | \theta\).
We now assume that
individual subject parameters \(\theta_j\) are normally
distributed around a group mean with a specific group variance
(\(\lambda = (\mu, \sigma)\), where these group parameters are
estimated from the data given hyper-priors \(G_0\)), resulting in
the following generative description:
\[\begin{split}\mu, \sigma &\sim G_0() \\
\theta_j &\sim \mathcal{N}(\mu, \sigma^2) \\
x_{i, j} &\sim f(\theta_j)\end{split}\]
Graphical notation of a hierarchical model. Circles represent
continuous random variables. Arrows connecting circles specify
conditional dependence between random variables. Shaded circles
represent observed data. Finally, plates around graphical nodes
mean that multiple identical, independent distributed random
variables exist.
Another way to look at this hierarchical model is to consider that our
fixed prior on \(\theta\) from above is actually
a random variable (in our case a normal distribution) parameterized by
\(\lambda\) which leads to the following posterior formulation:
\[P(\theta, \lambda | x) = \frac{P(x|\theta) \times P(\theta|\lambda) \times P(\lambda)}{P(x)}\]
Note that we can factorize \(P(x|\theta)\) and
\(P(\theta|\lambda)\) due to their conditional independence. This
formulation also makes apparent that the posterior contains estimation
of the individual subject parameters \(\theta_j\) and group
parameters \(\lambda\).
Hierarchical Drift-Diffusion Models used in HDDM
HDDM includes several hierarchical Bayesian model formulations for the
DDM and LBA. For illustrative purposes we present the graphical model
depiction of a hierarchical DDM model with informative priors and
group only inter-trial variablity parameters. Note, however, that
there is also a model with non-informative priors.
Basic graphical hierarchical model implemented by HDDM for
estimation of the drift-diffusion model.
Individual graphical nodes are distributed as follows.
\[\begin{split}\mu_{a} &\sim \mathcal{G}(1.5, 0.75) \\
\mu_{v} &\sim \mathcal{N}(2, 3) \\
\mu_{z} &\sim \mathcal{N}(0.5, 0.5) \\
\mu_{ter} &\sim \mathcal{G}(0.4, 0.2) \\
\sigma_{a} &\sim \mathcal{HN}(0.1) \\
\sigma_{v} &\sim \mathcal{HN}(2) \\
\sigma_{z} &\sim \mathcal{HN}(0.05) \\
\sigma_{ter} &\sim \mathcal{HN}(1) \\
a_{j} &\sim \mathcal{G}(\mu_{a}, \sigma_{a}^2) \\
z_{j} &\sim \text{invlogit}(\mathcal{N}(\mu_{z}, \sigma_{z}^2)) \\
v_{j} &\sim \mathcal{N}(\mu_{v}, \sigma_{v}^2) \\
ter_{j} &\sim \mathcal{N}(\mu_{ter}, \sigma_{ter}^2) \\
sv &\sim \mathcal{HN}(2) \\
ster &\sim \mathcal{HN}(0.3) \\
sz &\sim \mathcal{B}(1, 3) \\
x_{i, j} &\sim F(a_{i}, z_{i}, v_{i}, ter_{i}, sv, ster, sz)\end{split}\]
where \(x_{i, j}\) represents the observed data consisting of
reaction time and choice and \(F\) represents the DDM likelihood
function as formulated by . \(\mathcal{N}\)
represents a normal distribution parameterized by mean and standard
deviation, \(\mathcal{HN}\) represents a half-normal parameterized
standard-deviation, \(\mathcal{G}\) represents a Gamma
distribution parameterized by mean and rate, \(\mathcal{B}\)
represents a Beta distribution parameterized by \(\alpha\) and
\(\beta\). Note that in this model we do not attempt to estimate
individual parameters for inter-trial variabilities. The reason is
that the influence of these parameters onto the likelihood is often so
small that very large amounts of data would be required to make
meaningful inference at the individual level.
These priors are created to roughly match parameter values reported in
the literature and collected by . In the
below figure we overlayed those empirical values with the prior
distribution used for each parameter.
HDDM then uses MCMC to estimate the joint posterior distribution of
all model parameters.
Note that the exact form of the model will be user- consider
as an example a model where separate drift-rates v are estimated for
two conditions in an experiment: easy and hard. In this case, HDDM
will create a hierarchical model with group parameters
\(\mu_{v_{\text{easy}}}\), \(\sigma_{v_{\text{easy}}}\),
\(\mu_{v_{\text{hard}}}\), \(\sigma_{v_{\text{hard}}}\),and individual subject parameters \(v_{j_{\text{easy}}}\), and \(v_{j_{\text{hard}}}\).
Real World Example
In the following we will show an example session of using HDDM to
analyze a real-world dataset. The main purpose is to provide an overview
of some of the funcionality and interface. By no means, however, is it a
complete overview of all the functionality in HDDM. For more
information, including on how to use HDDM as a command-line utility, we
refer to the online tutorial at
and the how-to at
. For a reference manual,
First, we have to import the modules we are going to use so that they
are available in our namespace. Pandas provides a table-oriented
data-structure and matplotlib is a module for generating graphs and
import pandas as pd
import matplotlib.pyplot as plt
Next, we will import HDDM. At the time of this writing, this version was
import hddm
print hddm.__version__
Next, we will load in a data set. The easiest way to get your data into
HDDM is by storing it in a csv (comma-separated-value, see below) file.
In this example we will be using data collected in a reward-based
decision making experiment in our lab (Cavanagh et al 2011). In brief,
subjects choose between two symbols that have different histories of
reinforcement, which they first acquire through a learning phase: some
symbols more often leads to wins (W; 80%, 70% and 60% of trials in which
they are selected), whereas others only lead to win on 40%, 30%, or 20%
of the time and otherwise lead to losses (L). A test phase ensures in
which subjects choose between all paired combination of symbols without
feedback. These test trials can be devided into win-win (WW) trials, in
which they select between two symbols that had led to wins before (but
one more often than another); lose-lose trials (LL), and win-lose (WL)
trials, which are the easiest because one symbol had been a winner most
of the time. Thus WW and LL decisions together comprise high conflict
(HC) test trials (although there are other differences between them, we
don’t focus on those here), whereas WL decisions are low conflict (LC).
The main hypothesis of the study was that high conflict trials induce an
increase in the decision threshold, and that the mechanism for this
threshold modulation depends on communication between mediofrontal
cortex (which exhibits increased activity under conditions of choice
uncertainty or conflict) and the subthalamic nucleus (STN) of the basal
ganglia (which provides a temporary brake on response selection by
increasing the decision threshold). The details of this mechanism are
described in other modeling papers (e.g., Ratcliff & Frank 2012).
Cavanagh et al 2011 tested this theory by measuring EEG activity over
mid-frontal cortex, focusing on the theta band, given prior associations
with conflict, and testing whether trial-to-trial variations in frontal
theta were related to adjustments in decision threshold during high
conflict trials. They tested the STN component of the theory by
administering the same experiment to patients who had deep brain
stimulation (dbs) of the STN, which interferes with normal processing.
The first ten lines of the data file look as follows:
!head hddm_demo.csv
subj_idx,stim,rt,response,theta,dbs,conf
0,LL,1.21,1.0,0.00004,1,HC
0,WL,1.9999,1.0,-0.99998,1,LC
0,WW,1.03,1.0,-0.,1,HC
0,WL,2.77,1.0,1.9999,1,LC
0,WW,1.9999,0.0,-0.99999,1,HC
0,WL,1.9999,1.0,-0.99998,1,LC
0,LL,2.0,1.0,-0.00002,1,HC
0,WL,1.04,0.0,0.00004,1,LC
0,WW,0.99998,1.0,0.99999,1,HC
We use the hddm.load_csv() function to load this file.
data = hddm.load_csv('./cavanagh_theta_nn.csv')
data.head(10)
subj_idx stim
1 -0.327889
1 -0.480285
0 -0.213236
1 -0.436204
1 -0.274479
Lets look at the RT distributions of each individual subject using
pandas’ groupby() functionality. Because there are two possible
responses (here we are using accuracy coding where 1 means the more
rewarding symbol was chosen, and 0 the less rewarding) we flip error RTs
to be negative.
data = hddm.utils.flip_errors(data)
fig = plt.figure()
ax = fig.add_subplot(111, xlabel='RT', ylabel='count', title='RT distributions')
for i, subj_data in data.groupby('subj_idx'):
ax.hist(subj_data.rt, bins=20, histtype='step')
Lets fit a hierarchical DDM to this data set, starting off first with
the simplest model that does not allow parameters to vary by condition.
# Instantiate model object passing it our data (no need to call flip_errors() before passing it).
# This will tailor an individual hierarchical DDM around your dataset.
m = hddm.HDDM(data)
# find a good starting point which helps with the convergence.
m.find_starting_values()
# start drawing 7000 samples and discarding 5000 as burn-in
m.sample(2000, burn=20)
[**************100%****************]
2000 of 2000 complete
&pymc.MCMC.MCMC at 0xb0dd58c&
We now want to analyze our estimated model. m.print_stats() will
print a table of summary statistics for each parameters’ posterior.
Because that is quite long we only print a subset of the parameters
using pandas selection functionality.
stats = m.gen_stats()
stats[stats.index.isin(['a', 'a_var', 'a_subj.0', 'a_subj.1'])]
2.......261410
0.......591643
2.......500647
2.......254350
As you can see, the model estimated the group mean parameter for
threshold a, group variability a_var and individual subject
parameters a_subj.0. Other parameters are not shown here.
The inference algorithm, MCMC, requires the chains of the model to have
properly converged. While there is no way to guarantee convergence for a
finite set of samples in MCMC, there are many heuristics that allow you
identify problems of convergence. One main analysis to look at is the
trace, the autocorrelation, and the marginal posterior. You can plot
these using the plot_posteriors() function. For the sake of brevity
we only plot three here. In practice, however, you will always want to
examine all of them.
m.plot_posteriors(['a', 't', 'v', 'a_var'])
Plotting a
As you can see, there are no drifts or large jumps in the trace. The
autocorrelation is also very low.
The Gelman-Rubin statistic provides a more formal test for convergence
that compares the intra-chain variance to the intra-chain variance of
different runs of the same model.
models = []
for i in range(5):
m = hddm.HDDM(data)
m.find_starting_values()
m.sample(5000, burn=20)
models.append(m)
hddm.analyze.gelman_rubin(models)
[**************100%****************]
5000 of 5000 complete
{'a': 1.3685,
'a_std': 1.0589,
'a_subj.0': 1.2486,
'a_subj.1': 1.3347,
'a_subj.10': 0.04434,
'a_subj.11': 1.6241,
'a_subj.12': 0.92803,
'a_subj.13': 1.198,
'a_subj.2': 1.6893,
'a_subj.3': 0.10168,
'a_subj.4': 1.0589,
'a_subj.5': 1.8458,
'a_subj.6': 1.3637,
'a_subj.7': 1.3925,
'a_subj.8': 1.2398,
'a_subj.9': 1.5141,
't': 1.3631,
't_std': 1.934,
't_subj.0': 1.2438,
't_subj.1': 0.76831,
't_subj.10': 1.345,
't_subj.11': 1.988,
't_subj.12': 0.29111,
't_subj.13': 1.217,
't_subj.2': 1.744,
't_subj.3': 0.38351,
't_subj.4': 1.0087,
't_subj.5': 1.4043,
't_subj.6': 1.1291,
't_subj.7': 1.9302,
't_subj.8': 1.799,
't_subj.9': 1.5015,
'v': 0.85559,
'v_std': 1.3817,
'v_subj.0': 1.6014,
'v_subj.1': 0.51836,
'v_subj.10': 0.16663,
'v_subj.11': 1.6975,
'v_subj.12': 1.8358,
'v_subj.13': 1.4547,
'v_subj.2': 0.6288,
'v_subj.3': 1.9708,
'v_subj.4': 1.5802,
'v_subj.5': 0.80053,
'v_subj.6': 1.6591,
'v_subj.7': 0.60371,
'v_subj.8': 1.0522,
'v_subj.9': 1.7723}
We might also be interested in how well the model fits the data. To
inspect this visually you can call plot_posterior_predictive() to
plot individual subject RT distributions in red on top of the predictive
likelihood in blue.
m.plot_posterior_predictive(figsize=(14, 10))
While visually the fit looks decent, we also have prior knowledge about
our experiment which could be leveraged to improve the model. For
example, we would expect that because LL and WW trials are harder than
WL trials, drift rate would be higher in WL, which has lower uncertainty
about the correct choice. (One could also develop a posterior predictive
check statistic that would evaluate whether accuracy and mean RT are
different in the different conditions. Since the parameters of the model
were estimated to be the same across conditions, the posterior
predictive distributions for these conditions would not look different
from each other, whereas those in the data do. A formal posterior
predictive check would thus show that the data violates the simple
assumptions of the model. This is not evident above because we simply
plotted the distributions collapsed across conditions).
In any case, we can create a new model quite easily which estimates
separate drift-rate v for those different conditions by using the
depends_on keyword argument. This argument expects a Python dict
which maps the parameter to be split to the column name containing the
conditions we want to split by.
m_stim = hddm.HDDM(data, depends_on={'v': 'stim'})
m_stim.find_starting_values()
m_stim.sample(2000, burn=20)
[**************100%****************]
2000 of 2000 complete
&pymc.MCMC.MCMC at 0xaf29ccc&
We will skip examining the traces for this model and instead look at the
posteriors of v for the different conditions. Below you can see that
the drift rate for the low conflict WL condition is substantially
greater than that for the other two conditions, which are fairly similar
to each other.
v_WW, v_LL, v_WL = m_stim.nodes_db.node[['v(WW)', 'v(LL)', 'v(WL)']]
hddm.analyze.plot_posterior_nodes([v_WW, v_LL, v_WL])
While it would be easy to provide syntacic sugar for the above
expression there are many cases where you want access to the underlying
distributions. These are stored inside of nodes_db which is a pandas
DataFrame containing information about each distribution. Here we
retrieve the actual node objects containing the trace from the node
One benefit of estimating the model in a Bayesian framework is that we
can do significance testing directly on the posterior rather than
relying on frequentist statistics (See Kruschke’s book for many examples
of the advantages of this approach). For example, we might be interested
in whether the drift-rate for WW is larger than that for LL, or whether
drift-rate for LL is larger than WL. The below code allows us to examine
the proportion of the posteriors in which the drift rate for one
condition is greater than the other. It can be seen that the posteriors
for LL do not overlap at all for WL, and thus the probability that LL is
greater than WL should be near zero.
print &P(WW & LL) = &, (v_WW.trace() & v_LL.trace()).mean()
print &P(LL & WL) = &, (v_LL.trace() & v_WL.trace()).mean()
P(WW & LL) =
P(LL & WL) =
Lets compare the two models using the deviance information criterion (DIC; lower is better). Note that the DIC measures the fit of the model to the data, penalizing for complexity in the addition of degrees of freedom (the model with three drift rates has more dF than the model with one). The DIC is known to be somewhat biased in selecting the model with greater complexity, although alternative forms exist (see Plummer 2008). One should use the DIC with caution, although other forms of model comparison such as the Bayes Factor (BF) have other problems, such as being overly sensitive to the prior parameter distributions of the models. Future versions of HDDM will include the partial Bayes Factor, which allows the BF to be computed based on informative priors taken from a subset of the data, and which we generally believe to provide a better measure of model fit. Nevertheless, DIC can be a useful metric with these caveats in mind.
print &Lumped model DIC: %f& % m.dic
print &Stimulus model DIC: %f& % m_stim.dic
Lumped model DIC:
Stimulus model DIC:
Note that while the m_stim model we created above estimates
different drift-rates v for each subject, it implicitly assumes that
the different conditions are completely independent of each other,
because each drift rate was sampled from a separate group prior.
However, there may be individual differences in overall performance, and
if so it is reasonable to assume that someone who would be better at
WL would also be better at LL. To model this intuition we can
use a within-subject model where an intercept is used to capture overall
performance in the ‘WL’ condition as a baseline, and then the other
LL and WW conditions are expressed relative to WL. (Perhaps
every subject has a higher drift in WL than LL but there is huge
variance in their overall drift rates. In this scenario, the earlier
model would not have the power to detect the effect of condition on this
within subject effect, because there would be large posterior variance
in all of the drift rates, which would then overlap with each other. In
contrast, the within-subject model would estimate large variance in the
intercept but still allow the model to infer a non-zero effect of
condition with high precision).
HDDM supports this via the patsy module which transforms model
strings to design matrices.
from patsy import dmatrix
dmatrix(&C(stim, Treatment('WL'))&, data.head(10))
DesignMatrix with shape (10, 3)
C(stim, Treatment('WL'))[T.LL]
C(stim, Treatment('WL'))[T.WW]
'Intercept' (column 0)
"C(stim, Treatment('WL'))" (columns 1:3)
Patsy model specifications can be passed to the HDDMRegressor
class as part of a descriptor that contains the string describing the
linear model and the outcome variable that should be replaced with
the output of the linear model – in this case v.
m_within_subj = hddm.HDDMRegressor(data, &v ~ C(stim, Treatment('WL'))&)
Adding these covariates:
['v_Intercept', "v_C(stim, Treatment('WL'))[T.LL]", "v_C(stim, Treatment('WL'))[T.WW]"]
m_within_subj.sample(5000, burn=200)
[**************100%****************]
5000 of 5000 complete
&pymc.MCMC.MCMC at 0xb41712c&
v_WL, v_LL, v_WW = m_within_subj.nodes_db.node[[&v&,
&v_C(stim, Treatment('WL'))[T.LL]&,
&v_C(stim, Treatment('WL'))[T.WW]&]]
hddm.analyze.plot_posterior_nodes([v_WL, v_LL, v_WW])
Note that in the above plot LL and WW are expressed relative to
the WL condition (i.e. v_Intercept). You can see that the
overall drift rate intercept, here applying to WL condition, is positive
(mode value roughly 0.7), whereas the within subject effects of
condition (WW and LL) are negative and do not overlap with zero.
As mentioned above, cognitive neuroscience has embraced the DDM as it
enables to link psychological processes to cognitive brain measures. The
Cavanagh et al (2011) study is a great example of this. EEG recordings
provided a trial-ty-trial measure of brain activity (frontal theta), and
it was found that this activity correlated with increases in decision
threshold in high conflict trials. Note that the data set and results
exhibit more features than we consider here for the time being
(specifically the manipulation of deep brain stimulation), but for
illustrative purposes, we replicate here that main theta-threshold
relationship in a model restricted to participants without brain
stimulation. For more information, see
m_reg = hddm.HDDMRegressor(data[data.dbs == 0],
&a ~ theta:C(conf, Treatment('LC'))&,
depends_on={'v': 'stim'})
Adding these covariates:
['a_Intercept', "a_theta:C(conf, Treatment('LC'))[HC]", "a_theta:C(conf, Treatment('LC'))[LC]"]
Instead of estimating one static threshold per subject across trials,
this model assumes the threshold to vary on each trial according to the
linear model specified above (as a function of their measured theta
activity). We also test whether this effect interacts with decision
conflict. For the stimuli we use dummy treatment coding with the
intercept being set on the WL condition. Internally, HDDM uses Patsy for
the linear model specification, see the
details. The output notifies us about the different variables that being
estimated as part of the linear model. The Cavanagh paper, and results
shown later below, illustrate that this brain/behavior relationship
differs as a function of whether patients are on or off STN deep brain
stimulation, as hypothesized by the model that STN is responsible for
increasing the decision threshold when cortical theta rises).
m_reg.sample(5000, burn=200)
[***************82%*********
4100 of 5000 complete
theta = m_reg.nodes_db.node[&a_theta:C(conf, Treatment('LC'))[HC]&]
hddm.analyze.plot_posterior_nodes([theta], bins=20)
print &P(a_theta & 0) = &, (theta.trace() & 0).mean()
P(a_theta & 0) =
The above posterior shows that the effect of trial to trial variations
in frontal theta are to increase the estimated decision threshold: the
regression coefficient is positive, and more than 96% of it is greater
than zero.
As noted above, this experiment also tested patients on deep brain
stimulation (dbs). The full model in the paper thus allowed an
additional factor to estimate how dbs interacts with theta-threshold
relationship. Here we show for illustrative purposes that we can capture
the same effect by simply fitting a separate model to data only
including the case when dbs was turned on. You should see below that in
this case, the influence of theta on threshold reverses. This exercise
thus shows that HDDM can be used both to assess the influence of
trial-by-trial brain measures on DDM parameters, but also how parameters
vary when brain state is manipulated.
m_reg_off = hddm.HDDMRegressor(data[data.dbs == 1],
&a ~ theta:C(conf, Treatment('LC'))&,
depends_on={'v': 'stim'})
Adding these covariates:
['a_Intercept', "a_theta:C(conf, Treatment('LC'))[HC]", "a_theta:C(conf, Treatment('LC'))[LC]"]
m_reg_off.sample(5000, burn=200)
[**************100%****************]
5000 of 5000 complete
&pymc.MCMC.MCMC at 0xbfc9e8c&
theta = m_reg_off.nodes_db.node[&a_theta:C(conf, Treatment('LC'))[HC]&]
hddm.analyze.plot_posterior_nodes([theta], bins=10)
print &P(a_theta & 0) = &, (theta.trace() & 0).mean()
P(a_theta & 0) =
It is common to have outliers in any data set and RT data is no
exception. Outliers present a serious challenge to likelihood-based
approaches, as used in HDDM. Consider the possibility that 5% of trials
are not generated by the DDM process, but by some other process (e.g.
due to an attentional lapse). The observed data in those trials may be
very unlikely given the best DDM parameters that fit 95% of the data. In
the extreme case, the likelihood of a single trial may be zero (e.g. if
subjects respond very quickly, faster than the non-decision time t
parameter that would fit the rest of the data). Thus this single outlier
would force the DDM parameters to adjust substantially. To see the
effect of this we will generate data with outliers, but fit a standard
DDM model without taking outliers into account.
outlier_data, params = hddm.generate.gen_rand_data(params={'a': 2, 't': .4, 'v': .5}, size=200, n_fast_outliers=10)
m_no_outlier = hddm.HDDMInfo(outlier_data)
m_no_outlier.sample(2000, burn=50)
[**************100%****************]
2000 of 2000 complete
&pymc.MCMC.MCMC at 0xad7a90c&
m_no_outlier.plot_posterior_predictive()
As you can see, the predictive likelihood does not fit the RT data very
well. The model predicts far more RTs near the leading edge of the
distribution than are actually observed. This is because non-decision
time t is forced to be estimated small enough to account for a few
What we can do instead is fit a mixture model which assumes that
outliers come from a uniform distribution. (Note, outliers do not have
to be very fast or very slow, and the above example is just an obvious
illustration. Some proportion of the trials can be assumed to simply
come from a different process for which we make no assumptions about its
generation, and hence use a uniform distribution. This allows the model
to find the best DDM parameters that capture the majority of trials).
Here, we specify that we expect roughly 5% outliers in our data.
m_outlier = hddm.HDDMInfo(outlier_data, p_outlier=.05)
m_outlier.sample(2000, burn=20)
[**************100%****************]
2000 of 2000 complete
&pymc.MCMC.MCMC at 0xaf2c9cc&
m_outlier.plot_posterior_predictive()
As you can see, the model provides a much better fit. The outlier RTs
are having less of an effect because they get assigned to the uniform
outlier distribution.
Conclusions
We developed an open-source Python toolbox that allows hierarchical
Bayesian parameter estimation of the DDM and LBA and is successfully
being used in various cognitive research laboratories. The benefit of
this approach is two-fold: (i) improved accuracy of individual subject
parameter recover by modeling subject similarity in a hierarchical
and (ii) full Bayesian posterior inference that obsoletes
frequentist statistics and confidence intervals. Despite prior
methodological advancements in this area, a reliable software package
implementing this method was previously not available. Special care
has been taken to make this software easy to install and use. A simple
yet powerful model description mechanism allows the creation of
arbitrarily complex models where separate parameters can be estimated
for different conditions. Additional features like
posterior-predictive-checks allow for model validation and
comparison. Finally, to facilitate cross-domain application of this
toolbox in the cognitive neurosciences, it is simple to estimate
trial-by-trial influences of e.g. brain-measures on individual model
parameters.
Acknowledgements
The software was written by TVW and IS. The paper was written by TVW
and MJF. We thank Guido Biele and ?ystein Sandvik for useful feedback
and code contributions.
References
Stefan Behnel, Robert Bradshaw, Craig Citro, Lisandro Dalcin, Dag&Sverre Seljebotn, and Kurt Smith. Cython: The Best of Both Worlds. Computing in Science & Engineering, 13(2):31–39, March 2011.
[2](, ) Scott&D Brown and Andrew Heathcote. The simplest complete model of choice response time: linear ballistic accumulation.. Cognitive psychology, 57(3):153–78, November 2008.
James&F Cavanagh, Thomas&V Wiecki, Michael&X Cohen, Christina&M Figueroa, Johan Samanta, Scott&J Sherman, and Michael&J Frank. Subthalamic nucleus stimulation reverses mediofrontal influence over decision threshold. Nature Neuroscience, September 2011.
Ivar A&H Clemens, Maaike De Vrijer, Luc P&J Selen, Jan A&M Van Gisbergen, and W&Pieter Medendorp. Multisensory processing in spatial orientation: an inverse probabilistic approach.. The Journal of neuroscience : the official journal of the Society for Neuroscience, 31(14):5365–77, April 2011.
[5](, ) Chris Donkin, Scott Brown, Andrew Heathcote, and Eric-Jan Wagenmakers. Diffusion versus linear ballistic accumulation: different models but the same conclusions about psychological processes?. Psychonomic bulletin & review, 18(1):61–9, February 2011.
William Feller. An Introduction to Probability Theory and Its Applications, Vol. 1, 3rd Edition. Wiley, 1968.
Birte&U Forstmann, Gilles Dutilh, Scott Brown, Jane Neumann, D&Yves von Cramon, K&Richard Ridderinkhof, and Eric-Jan Wagenmakers. Striatum and pre-SMA facilitate decision-making under time pressure.. Proceedings of the National Academy of Sciences of the United States of America, 105(45):17538–42, November 2008.
D&Gamerman and HF&Lopes. Markov chain Monte Carlo: stochastic simulation for Bayesian inference. Chapman $\backslash $& Hall/CRC, 2006.
A&Gelman, JB&Carlin, HS&Stern, and DB&Rubin. Bayesian data analysis. Chapman $\backslash $& Hall/CRC, 2003.
[10](, ) J&Kruschke. Doing Bayesian data analysis: A tutorial introduction with R and BUGS. Academic Press / Elsevier, 2010.
David LaBerge. A recruitment theory of simple behavior. Psychometrika, 27(4):375–396, December 1962.
[12](, ) M.&D. Lee and E.-J. Wagenmakers. Bayesian Modeling for Cognitive Science: A Practical Course.. Cambridge University Press., 2013.
D&Matzke and EJ&Wagenmakers. Psychological interpretation of the ex-Gaussian and shifted Wald parameters: A diffusion model analysis. Psychonomic Bulletin & Review, 2009.
H?kan Nilsson, J?rg Rieskamp, and Eric-Jan Wagenmakers. Hierarchical Bayesian parameter estimation for cumulative prospect theory. Journal of Mathematical Psychology, 55(1):84–93, February 2011.
Anand Patil, David Huard, and Christopher&J Fonnesbeck. PyMC: Bayesian Stochastic Modelling in Python.. Journal of statistical software, 35(4):1–81, July 2010.
Dale&J. Poirier. The growth of Bayesian methods in statistics and economics since 1970. Bayesian Analysis, 1(4):969–979, December 2006.
R&Ratcliff, MG&Philiastides, and P&Sajda. Quality of evidence for perceptual decision making is indexed by trial-to-trial variability of the EEG. Proceedings of the National Academy of Sciences, 106(16):, 2009.
Roger Ratcliff and Gail McKoon. The diffusion decision model: theory and data for two-choice decision tasks.. Neural computation, 20(4):873–922, April 2008.
[20](, , ) Roger Ratcliff and J.&N. Rouder. Modeling Response Times for Two-Choice Decisions. Psychological Science, 9(5):347–356, September 1998.
RM&Shiffrin, MD&Lee, and W&Kim. A survey of model evaluation approaches with a tutorial on hierarchical Bayesian methods. Cognitive …, 2008.
[22](, ) Philip&L Smith and Roger Ratcliff. Psychology and neurobiology of simple decisions.. Trends in neurosciences, 27(3):161–8, March 2004.
Matthew Stephens and David&J Balding. Bayesian statistical methods for genetic association studies.. Nature reviews. Genetics, 10(10):681–90, October 2009.
James&T. Townsend and F.&Gregory Ashby. The stochastic modeling of elementary psychological processes. Cambridge University Press, 1983.
Joachim Vandekerckhove and Francis Tuerlinckx. Diffusion model analysis with MATLAB: A DMAT primer. Behavior Research Methods, 40(1):61–72, February 2008.
Joachim Vandekerckhove, Francis Tuerlinckx, and Michael&D Lee. Hierarchical diffusion models for two-choice response times.. Psychological methods, 16(1):44–62, March 2011.
D&Vickers. Evidence for an accumulator model of psychophysical discrimination.. Ergonomics, 13(1):37–58, January 1970.
Andreas Voss and Jochen Voss. Fast-dm: a free program for efficient diffusion model analysis.. Behavior research methods, 39(4):767–75, November 2007.}

我要回帖

更多关于 drift flux model 的文章

更多推荐

版权声明:文章内容来源于网络,版权归原作者所有,如有侵权请点击这里与我们联系,我们将及时删除。

点击添加站长微信