H. A. Scheppink1, S. Ahmadi1, P. Desain1, M. Tangermann1, J. Thielen11Donders Institute, Radboud University, Nijmegen, the NetherlandsE-mail: jordy.thielen@donders.ru.nlABSTRACT: Auditory attention decoding (AAD) aims to extract from brain activity the attended speaker amidst candidate speakers, offering promising applications for neuro-steered hearing devices and brain-computer interfacing. This pilot study makes a first step towards AAD using the noise-tagging stimulus protocol, which evokes reliable code-modulated evoked potentials, but is minimally explored in the auditory modality. Participants were sequentially presented with two Dutch speech stimuli that were amplitude modulated with a unique binary pseudo-random noise-code, effectively tagging these with additional decodable information. We compared the decoding of unmodulated audio against audio modulated with various modulation depths, and a conventional AAD method against a standard method to decode noise-codes. Our pilot study revealed higher performances for the conventional method with 70 to 100 percent modulation depths compared to unmodulated audio. The noise-code decoder did not further improve these results. These fundamental insights highlight the potential of integrating noise-codes in speech to enhance auditory speaker detection when multiple speakers are presented simultaneously.
(wS, rZ) where S = [X1, X j, . . . , XJ] are the concatenated training EEG segments, and Z = [Ay1, Ay j , . . . , AyJ ] are the accompanying concatenated speech envelopes. To classify new data X RCT (here T = the decision window length), eCCA chooses the candidate speech envelope that maximizes the correlation between the spatially filtered EEG data and the projected speech envelopes: y = arg max (wX, rAi) i Instead of using the first component only, as in Eq. 2, using multiple components can improve classification accuracy but requires an additional classification model, e.g., a linear discriminant analysis (LDA) . Specifically, CCA can deliver K = min(C, L) orthogonal components, ordered on decreasing canonical correlation. The k-th component contributes a spatial filter wk and temporal filter rk, and delivers a Pearsons correlation coefficient ki (1) (2)
id: ccc162f4b57e8cbfa7d38692e737ce14 - page: 3
2. These correlation coefficients across K components (here K = 3) are collected in a vector i for speaker i, and a feature vector f is created by subtracting the speakers canonical correlation vectors, f = 1 2. The low-dimensional feature vector f can then be classified using a vanilla LDA, solving a binary classification problem of whether speaker 1 or speaker 2 was attended. Reconvolution CCA: The rCCA approach consists of a template-matching classifier that predicts the attended speaker given the neural response evoked by the binary noise-code. The reconvolution model is based on the superposition hypothesis, stating that the response to a sequence of events is the linear summation of the responses evoked by the individual events . For the reconvolution, the event time-series Ei RET for E-many events and T -many samples (here T is one segment size) denotes the onsets of the E events for the ith noise-code. In this work, we used E = 2 events being the sh
id: 2703712b1c8f61ab418e6b317291b36d - page: 4
The event matrix is mapped to a structure matrix Mi RMT for M-many event time points and T -many samples. This matrix maps each event to an impulse response function. Specifically, this matrix is Toeplitz-like and describes the onset, duration, and importantly the overlap of each of the events. Assuming both events evoke a response of identical length L, then M = E L (here L = 60 for 500 ms at 120 Hz). In this work, we extend the standard rCCA model from Thielen and colleagues to incorporate envelope information. This is a crucial step, because a 1 in the code does not necessitate that there was audio in the stimulus. By incorporating the envelope and combining these with the events in the structure matrix, it can be avoided that an event is expected even though there was an audio amplitude of zero in the speech signal at that time. This is achieved by element-wise multiplying the event matrix Ei by the amplitudes of the envelope
id: a944d7a3e1a7bbe539afc8dda6668e39 - page: 4
dataset have Lets {(X1, y1), (X j, y j) . . . , (XJ, yJ)} including the labeled EEG data for j {1, ..., J} trials with the EEG data X RCT of C-many channels and T -many samples and the associated binary label y {0, 1}. To find the optimal spatial filter w and temporal response vector r, a CCA maximizes the correlation in the projected spaces:
id: 82cd8be3a306fabd98589eacf56ad979 - page: 4