Created at 8pm, Mar 26
Ms-RAGScience
0
An image-computable model of speeded decision-making
jLWsGzqcy5hjytS0TRiAm9n8L3Vsn_gUTzWxsqGpPBI
File Type
PDF
Entry Count
154
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Paul I. Jaffe1,, Gustavo X. Santiago-Reyes2, Robert J. Schafer3,Patrick G. Bissett1, and Russell A. Poldrack11-Department of Psychology, Stanford University2-Department of Bioengineering, Stanford University3-Lumos LabsCorresponding author: pijaffe@stanford.eduMarch 26, 2024AbstractEvidence accumulation models (EAMs) are the dominant framework for modeling response time (RT) data from speeded decision-making tasks. While providing a good quantitative description of RT data in terms of abstract perceptual representations, EAMs do not explain how the visual system extracts these representations in the first place. To address this limitation, we introduce the visual accumulator model (VAM), in which convolutional neural network models of visual processing and traditional EAMs are jointly fitted to trial-level RTs and raw (pixel-space) visual stimuli from individual subjects. Models fitted to largescale cognitive training data from a stylized flanker task captured individual differences in congruency effects, RTs, and accuracy. We find evidence that the selection of task-relevant information occurs through the orthogonalization of relevant and irrelevant representations, demonstrating how our framework can be used to relate visual representations to behavioral outputs. Together, our work provides a probabilistic framework for both constraining neural network models of vision with behavioral data and studying how the visual system extracts representations that guide decisions.

This may account for our observations that the measures of suppression we considered are not correlated with accuracy congruency effects across models, and that representations for flanker direction do not appear to be strongly suppressed even in later network layers, as evidenced by high decoding accuracy for flanker direction in both the VAMs and task-optimized models.
id: 6350780f51b8411639f30e2a5bba5863 - page: 13
One apparent limitation of the VAM as presented here is that it does not have dynamics, which seem to be required to explain some observations in the flanker task and related conflict tasks. For example, the RTs on incongruent error trials are typically faster than error RTs for congruent trials and RTs for correct trials , an effect that we confirm is also present in the flanker task variant studied here. This observation can be explained by a "shrinking attention spotlight" in which response activation from flankers starts high and diminishes over time, resulting in a higher proportion of errors for faster RTs . While the primary VAMs analyzed in this paper did not capture these error patterns, we show that the simple modification of training separate VAMs on each RT quantile produced error patterns that closely resemble those observed in the participant data. The representations learned by these models could in principle be compared, allowing one to investigate the mechanism
id: b562f3d2bba1d4947685bdf3069d0a9f - page: 13
Future work may aim to incorporate true dynamics into the visual component and decision component of the VAM with recurrent CNNs , and the task-DyVA model , respectively.
id: 9d05336aafe6262765782e10359dd67c - page: 13
In summary, the VAM is a probabilistic model of psychophysical data that captures how raw sensory inputs are transformed into the abstract representations that guide decisions. Raw (pixel-space) visual stimuli are processed by a biologically-plausible neural network model of vision that outputs the parameters of a traditional decision-making model. Each VAM is fitted to data from a single participant, a feature that allowed us to study how individual differences in behavior emerge from differences in the "brains" of the models. To this end, we found that models with smaller congruency effects had more orthogonal representations for task-relevant and irrelevant information. While we chose to use a CNN to model visual processing, we note that the VAM is not limited to this choice: other sensory encoding models, such as those based on transformer architectures , can be readily swapped in to replace the CNN with minimal changes to the underlying VAM implementation. Similarly, the LBA
id: 2799a9109dd487ebd26ab50f22e435a2 - page: 13
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "jLWsGzqcy5hjytS0TRiAm9n8L3Vsn_gUTzWxsqGpPBI", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "jLWsGzqcy5hjytS0TRiAm9n8L3Vsn_gUTzWxsqGpPBI", "level": 2}'