Created at 8pm, Mar 26
Ms-RAGScience
0
Backpropagation through space, time, and the brain
pbDQrsB5o7y-To952u9l5EfQ3A47zfFLq7Qes_zYS8o
File Type
PDF
Entry Count
76
Embed. Model
jina_embeddings_v2_base_en
Index Type
hnsw

Benjamin Ellenberger, Paul Haider,Jakob Jordan, Kevin Max, Ismael Jaras, Laura Kriener, Federico Benitez,Mihai A. Petrovici∗Shared first authorshipDepartment of Physiology, University of Bern, 3012 Bern, Switzerland.AbstractEffective learning in neuronal networks requires the adaptation of individual synapses given their relative contribution to solving a task. However, physical neuronal systems – whether biological or artificial – are constrained by spatio-temporal locality. In other words, synapses can only use information available at their physical location and at the same moment in time as the synaptic updates themselves. How such networks can perform efficient credit assignment, remains, to a large extent, an open question. In Machine Learning, the answer is almost universally given by the error backpropagation algorithm, through both space (BP) and time (BPTT). However, BP(TT) is well-known to rely on biologically implausible assumptions, in particular with respect to spatiotemporal (non-)locality, while forward-propagation models such as real-time recurrent learning (RTRL) suffer from prohibitive memory constraints.We introduce Generalized Latent Equilibrium (GLE), a computational framework for fully local spatio-temporal credit assignment in physical, dynamical networks of neurons. We start by defining an energy based on neuron-local mismatches, from which we derive both neuronal dynamics via stationarity and parameter dynamics via gradient descent. The resulting dynamics can be interpreted as a real-time, biologically plausible approximation of BPTT in deep cortical networks with continuous-time neuronal dynamics and continuously active, local synaptic plasticity. In particular, GLE exploits the ability of biological neurons to phase-shift their output rate with respect to their membrane potential, which is essential in both directions of information propagation. For the forward computation, it enables the mapping of time-continuous inputs to neuronal space, performing an effective spatiotemporal convolution. For the backward computation, it permits the temporal inversion of feedback signals, which consequently approximate the adjoint states necessary for useful parameter updates.

First, we compare GLE with state-of-the-art benchmarks for MNIST-1D (Fig. 6a). Other than the name itself and the number of classes, MNIST-1D bears little resemblance to its classical namesake. Here, each example is a onedimensional sequence of points, which provides a temporal sequence as input to the network. In order to correctly classify a sample, the network must be able to store a (processed) memory of the inputs past. Unlike for networks trained with BPTT, for GLE this combination of dynamic memory, learning and classification must occur online.
id: d9ec06d72eadb241bd67c52a3a91c938 - page: 8
In Fig. 6b, we can see how a multi-layered perceptron fails to appropriately learn to classify this dataset, reaching a validation accuracy of only around 60%. This is despite the perceptron having access to the entire data from the sample at the same time, unrolled from time into space. This highlights the difficulty of the MNIST-1D task, where the network needs to learn to neglect different kinds of temporal noise on multiple time scales . More involved machine learning methods achieve much better results, with temporal convolutional networks (TCNs, ) and gated recurrent units (GRUs, ) achieving averages of over 90%. Notably, both of these methods work offline, with TCNs in particular requiring a mapping of temporal signals to spatial representations beforehand, and GRUs requiring offline BPTT training.
id: 5c8f00e6fadcb28430a7e308ac1e724d - page: 8
Because GLE networks learn fully online, they exhibit longer convergence times, as they are often confronted with counterproductive combinations of inputs and targets (3/4 of the total input in MNIST-1D is noise). Still, the prospective errors manage to provide good online approximations of the true gradients for updating the network parameters. Thus, 8 despite facing a significantly more difficult task compared to the methods that have access to the full network activity unrolled in time, GLE achieves highly competitive classification results.
id: db97e3254564ff58aa31846c7c18a344 - page: 8
In Fig. 6c, we show the results of GLE on the Google Speech Commands dataset . This dataset consists of 105829 one-second long audio recordings of 35 different speech commands, each spoken by thousands of people. In the v2.12 version of this dataset, the usual task is to classify 10 different speech commands in addition to a silence and an unknown class, which comprises all remaining commands. The raw audio signal is transformed into a sequence of 32 Mel spectrograms with 32 frequency bins each to make it a more salient input for classification. Similarly to the MNIST-1D dataset, the GLE network is trained online, while the other networks are trained offline, with the MLP and TCN using all temporal data unrolled into space. On this task, GLE surpasses the MLP and achieves a performance that comes close to TCN and GRU networks. We thus conclude that the advantages of GLE in terms of biological plausibility and online learning capability come
id: 62f9579c8e35b9e7d75b925d02c36426 - page: 9
How to Retrieve?
# Search

curl -X POST "https://search.dria.co/hnsw/search" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"rerank": true, "top_n": 10, "contract_id": "pbDQrsB5o7y-To952u9l5EfQ3A47zfFLq7Qes_zYS8o", "query": "What is alexanDRIA library?"}'
        
# Query

curl -X POST "https://search.dria.co/hnsw/query" \
-H "x-api-key: <YOUR_API_KEY>" \
-H "Content-Type: application/json" \
-d '{"vector": [0.123, 0.5236], "top_n": 10, "contract_id": "pbDQrsB5o7y-To952u9l5EfQ3A47zfFLq7Qes_zYS8o", "level": 2}'