1 Introduction

One of the most defining assumptions in task experiments is that neural activity is an idealized variable that is either on or off following the exact timing of a presented stimulus. This ‘idealized neural activity’ assumption has shaped the task experiment literature for decades and constitutes the origin of hypothesis-driven models in fMRI. The most prominent hypothesis-driven model is the GLM [1]. The design of the GLM is based on the following assumptions: (a) the stimulus presentation during a task experiment results in an immediate increase in neural activation; (b) the increase in neural activity translates into a BOLD signal change by a filter operation following the laws of a linear time-invariant (LTI) system [2]. The impulse response function of this LTI system is referred to as hemodynamic response function (HRF) in fMRI. The estimates of the GLM result in a statistical parametric mapping (SPM) showing to what degree individual areas of the brain are involved in the stimulus processing.

In contrast to the hypothesis-driven GLM, there are data-driven methods such as spatial independent component analysis (ICA) [3] that factorizes fMRI data into a set of timecourses and spatial maps by optimizing a proxy function for spatial independence.

We propose HMF, a novel data-driven technique that decomposes fMRI data into a set of components, where each component consists of a latent neural activation time course and a spatial map. The neural activation time course is translated into a BOLD time course with the canonical HRF. Blind to the original stimulus timing, HMF recovers neural activation time courses that match the idealized neural activation assumed in the motor task experiment and reveal non-compliant subject behavior.

2 Hemodynamic Matrix Factorization

The proposed generative model is concerned with BOLD time courses of length T for S subjects, measured on V voxels: \(\{\mathbf {Y_s} \in \mathbb R^{T\times V},s=1\cdots S\}\). We assume that the signal change observed in all time-concatenated subjects \(\mathbf {Y}\in \mathbb R^{(T*S)\times V}\) is driven by C latent components that form depending on the performed task. Our model assumes that each component consists of a BOLD time course \(b_c \in \mathbb R^{T\times 1}\) and a spatial map \(h_c \in \mathbb R^{1\times V}\). The BOLD time course \(b_c=n_c\circledast f\) is obtained by convolving a neural activation time course \(n_c\) with a canonical haemodynamic response function (HRF) f. Given that multiple time courses are convolved with the same filter f, the convolution operation can be rewritten as matrix multiplication of a time course matrix \(\mathbf {N}\in \mathbb {R}^{T\times C}\) with Töplitz matrix \(\mathbf {F}\in \mathbb {R}^{T\times T}\) of filter f such that \(\mathbf {B=FN}\). The proposed generative model for the data \(\mathbf {Y}\) is then obtained by C components such that

$$\begin{aligned} \mathbf {Y} = \mathbf {FNH}, \end{aligned}$$
(1)

where matrix \(\mathbf {N} \in \mathbb R^{T*S\times C}\) contains the individual neural activation time courses per subject \(\mathbf {N_s}\in \mathbb R^{T\times C}\) and matrix \(\mathbf {H} \in \mathbb R^{C\times V}_+\) contains a set of spatial maps shared among all subjects.

We minimize the error between the actual data \(\mathbf {Y}\) and the approximated data generated by the model \(\mathbf {FNH}\), resulting in the following cost function:

$$\begin{aligned} J=\frac{1}{2}\Vert \mathbf {M}\odot \left( \mathbf {FNH}-\mathbf {Y}\right) \Vert _2^{2}=\frac{1}{2}\Vert \mathbf {M}\odot \left( \mathbf {FN}\sigma (\mathbf {(FN)^\top Y} + b)-\mathbf {Y}\right) \Vert _2^{2}, \end{aligned}$$
(2)

where the matrix \(\mathbf {M}\in \mathbb R^{1\times V}\) weighs the importance of each individual voxels.

The regularization for the spatial map matrix \(\mathbf {H}\) and neural activation matrix \(\mathbf {N}\) are introduced in the following.

Regularization of Spatial Maps \(\mathbf {H}\). To enforce sparse spatial maps, we use the Kullback Leibler divergence between two exponential distributions \(\varDelta (p||{q(H_c)}) = \log (\lambda ) - \log ({\hat{\lambda }_c}) + \frac{{\hat{\lambda }_c}}{\lambda } -1\) between desired rate parameter \(\lambda \) and estimated rate parameter \(\hat{\lambda _c}\) of an exponential value distribution approximating the value distribution in each spatial map \(H_c\in \mathbb R^{1\times V}\), using the following regularization term:

$$\begin{aligned} R_{IS} = \frac{1}{C}\sum _{c=1}^{C}\varDelta (p||{q(H_c)}). \end{aligned}$$
(3)

In addition to sparsity, we want to enforce smoothness of the spatial maps using anisotropic total variation. Given that we can reshape spatial map matrix \(\mathbf {H}\) in a 4D tensor \(\mathbf {H^{(4D)}}\), we want to minimize the following regularization cost:

(4)

The product of \(\mathbf {H} = \sigma (\mathbf {(FN)^\top Y}+b)\) can be rewritten element-wise as

(5)

where \(\sigma (\mathbf {(FN)^\top } y_v+b)\) constitutes a column vector for voxel v. All column vectors taken together form the spatial maps \(\mathbf {H}\) as depicted in Eq. 5. Furthermore, one can rearrange the original data \(\mathbf {Y}\) in individual matrices along either dimension V1, V2 or V3 of the three-dimensional Euclidean space. An approximation of Eq. 4 can then be computed with the following term

$$\begin{aligned} \begin{aligned} R_{\mathbf {H}}=&\sum _{i}^{V2}\sum _{j}^{V3} \Vert \sigma (\mathbf {(FN)^\top Y_{ij}^{(1)}}+b)\mathbf {D_1^\top } \Vert _1 \\&+ \sum _{i}^{V1}\sum _{j}^{V3} \Vert \sigma (\mathbf {(FN)^\top Y_{ij}^{(2)}}+b)\mathbf {D_2^\top } \Vert _1 + \sum _{i}^{V1}\sum _{j}^{V2} \Vert \sigma (\mathbf {(FN)^\top Y_{ij}^{(3)}}+b)\mathbf {D_3^\top } \Vert _1 \end{aligned} \end{aligned}$$
(6)

where \(\mathbf {Y_{ij}^{(1)}}\in \mathbb R^{T\times V1}\), \(\mathbf {Y_{ij}^{(2)}}\in \mathbb R^{T\times V2}\), \(\mathbf {Y_{ij}^{(3)}}\in \mathbb R^{T\times V3}\), \(\mathbf {D_1}\in \mathbb R^{(V1-1)\times V1}\), \(\mathbf {D_2}\in \mathbb R^{(V2-1)\times V2}\) and \(\mathbf {D_3}\in \mathbb R^{(V3-1)\times V3}\). The difference operator matrices \(\mathbf {D_i}\) have the form

$$\begin{aligned} \mathbf {D_i} = \left( \begin{array} { c c c c c c } { - 1 } &{} { 1 } &{} { } &{} { } &{} { } &{} { } \\ { } &{} { - 1 } &{} { 1 } &{} { } &{} { } &{} { } \\ { } &{} { } &{} { - 1 } &{} { 1 } &{} { } &{} { } \\ { } &{} { } &{} { } &{} { \ddots } &{} { } \\ { } &{} { } &{} { } &{} { - 1 } &{} { 1 } \\ { } &{} { } &{} { } &{} { } &{} { - 1 } &{} { 1 } \end{array} \right) \end{aligned}$$
(7)

Regularization of Neural Activation \(\mathbf {N}\). The approximate anisotropic total variation proposed for spatial regularization in Eq. 7 can also be applied as temporal regularization to the neural time course matrix \(\mathbf {N}\) with corresponding cost function

$$\begin{aligned} R_{\mathbf {N}} = \Vert \mathbf {N}\mathbf {D_4}\Vert _1, \end{aligned}$$
(8)

where \(\mathbf {D_4}\in \mathbb R^{(T*S-1)\times (T*S)}\).

The overall cost \(\varOmega =J+R_{N}+R_{IS}+R_{H}\) is optimized with respect to neural activation time courses \(\mathbf {N}\).

3 Experiments and Results

3.1 Experimental Setup

We applied HMF to the motor task of the Midnight scan club (MSC) [4] open source dataFootnote 1, which comprises 10 healthy adult subjects performing the same motor task in 10 different sessions.

The motor task design consists of 5 repeated blocks, in which a subject performs one of the 5 movement types, which are left or right foot, left or right hand, or tongue movement. There is a visual cue at the beginning of each block instructing the subject of what movement to conduct. For each fMRI scan, volumes were aligned to the first volume to correct for head motion. The first volume was linearly registered to its corresponding bias-corrected T1-weighted anatomical scan. The intra-subject affine registration and non-linear registration to the Montreal Neurological Institute (MNI) template were combined to map all fMRI volumes with one re-sampling into the MNI space sampled in a 4mm isotropic resolution. Time courses of voxels within brain tissue were extracted, high-pass filtered (0.01Hz cut-off) to remove signal drifts from scanner instabilities, centered and variance-normalized. The functional 4D volume set per scan s was reshaped into a matrix \(\mathbf {Y}_s \in \mathbb R^{T\times V}\) with T time points and V voxels.

In the following section, we compare the neural activation time courses of the HMF components to the stimulus timing of each task. We then compare the corresponding BOLD time courses of these HMF components to the timecourses obtained by spatial ICA. ICA decomposes the data \(\mathbf {Y=BH}\) into a matrix of BOLD time courses \(\mathbf {B} \in \mathbb R^{T*S\times C}\) and a matrix of spatial maps \(\mathbf {H} \in \mathbb R^{C\times V}\). For both decompositions, we set the number of components C to either 20, 40, 60, 80, or 100 components.

3.2 Neural Activation

Figure 1 shows the spatial map and neural activation time course of the 5 most correlated HMF components to the visual cues (decreasing top to bottom) when \(C=100\). The spatial maps obtained by HMF match with anatomical areas that are likely involved in the motor task processing. The spatial maps shown in the first and fourth row of Fig. 1, are related to visual stimulus processing. Whereas the second, third and fifth row show spatial maps that comprise anatomical areas involved in motor planning and execution, such as supplementary motor cortex and basal ganglia network.

Fig. 1.
figure 1

Spatial map and neural activation time course (continuous line) of the 5 most correlated HMF components to the visual cue timing (dotted line) for the first session of each subject. (SMC = Supplementary Motor Cortex)

Fig. 2.
figure 2

Spatial map and neural activation time course (continuous line) of the 5 different movements, namely, left and right foot, left and right hand and tongue (from the top to bottom), of the most correlated HMF component to the block timing of the movement (dotted line) for the first session of each subject. The red dotted box shows subject 6 confusing right and left hand movement in the second task block. (Color figure online)

Figure 2 depicts the five individual block timings (dotted lines) related to the visual cue shown in Fig. 1. Each row corresponds to one of the 5 movements: left foot (first row), right foot (second row), left hand (third row), right hand (fourth row) and tongue (fifth row). The HMF decomposition provides a good model for hand and tongue movements as the estimated neural activation time course closely resembles the idealized neural activation assumed for the task. It is evident from Fig. 2 that the noise level in subject 4, results in a poor model fit, likely due to head motion.

Most importantly, HMF is able to detect that subject 6 performed a right hand movement instead of a left hand movement and vice versa in the second task block (red box in Fig. 2). Finally, the neural activation time courses for the foot movement not always resemble the expected idealized neural activation for all subjects. Indeed, the neural activation time course in subject 1 closely resembles the task design whereas there is overlap between neural activation in left and right foot HMF component for subject 2.

3.3 Comparison of HMF and ICA

We compare BOLD time course and spatial map of the HMF components to the respective equivalents of ICA. Figure 3 depicts the distribution of correlation values of the most correlated HMF component to each individual movement and visual timing convolved with the canonical HRF (i.e., left and right toe, left and right hand and tongue). Figure 4 shows the reproducibility of the spatial map of the one component associated with an individual movement or visual cue in each session. The reproducibility is defined by the pair-wise spatial correlation between corresponding components of two distinct sessions.

HMF and ICA produced components with similar spatial and temporal structure as presented in the supplementary material in Figs. 6 and 7, respectively. This is evident in Fig. 3 where the correlation between the BOLD time courses and the task design increases with the increase of the number of components C. HMF components presented with a higher median correlation than the components obtained by ICA for all the task and all values of C except for the left and right foot when \(C=80\). Moreover, the spatial maps of HMF are more reproducible than the spatial maps of ICA as shown in Fig. 3.

Fig. 3.
figure 3

The distribution of temporal correlation of HMF and ICA components with the corresponding task timing.

Fig. 4.
figure 4

The distribution of spatial correlation between components of different task sessions that relate to the same task timing.

4 Discussion

We proposed a novel data-driven technique called HMF that decomposes fMRI data into components whose neural activation time courses resemble timings of the respective visual stimulation and movement execution timing. Most importantly, the timing information of individual stimuli and task execution were not known to HMF. For the visual stimulation, HMF identified 5 components that were highly correlated with visual cue timings suggesting their involvement in the processing of the visual cues as well as the contemplation of subsequent motor execution in the movement blocks. Regarding the movement execution, HMF identifies only one highly correlated component with the block timings (see Fig. 5 in supplementary material) for each type of the 5 movements.

We compared HMF to ICA, which is the most common data-driven tool to infer brain networks from spontaneous intrinsic neural activity. The results showed that both techniques produced task-related components whose BOLD time courses and spatial maps are highly similar. However, HMF produced on average more correlated components than ICA. This is likely due to the regularization of HMF that relates adjacent points in time and space in fMRI data, whereas there is no such regularization in ICA.

In contrast to data-driven techniques, hypothesis-driven techniques such as GLM require assumptions of how the task design relates to neural activation and subsequent change in BOLD. Although HMF did not use such task timing information, it generated spatial maps that matched the spatial activation maps presented in [4]. The GLM provides a statistical test framework to infer brain activation maps involved in the task processing. However, in contrast to data-driven techniques, the GLM cannot model intrinsic brain activity and relies crucially on task participation to obtain correct statistical test estimates.

If task participation is different than expected, statistical inference within the GLM framework will lead to a wrong statistical estimates. Tools such as HMF are therefore needed to validate task participation, especially in particular cohorts such as children that are likely to be less task compliant.

Data-driven techniques produce varying results depending on hyper-parameters, i.e. the number of components or the strength of applied regularization. Inferring neural activation from BOLD is an ill-posed problem given that multiple neural activation patterns can produce the exact same BOLD time course. We apply total variation regularization assuming that the most plausible is likely the most variational simple solution. However, sensible hyper-parameters can be found by finding a balance between reproducibility and explained variance.

We have developed a robust physiology-based methodology to analyze task-based fMRI data at the individual level. Methods such as this will help us to guide our future understanding of the processes of cognition.