The evolution of convolutional neural networks (CNNs)

13 The evolution of convolutional neural networks (CNNs)

13.1 What are Convolutional Neural Networks (CNNs)?

Neural networks, whether biological or artificial, can learn patterns. Visual patterns can appear at different points in the field of view. Other patterns can also change their position in the signal space. It was therefore necessary to develop neural networks in which a shift in a pattern did not adversely affect its recognition. This is precisely the capability possessed by CNNs

Convolutional Neural Networks (CNNs) are a special class of artificial neural networks that are particularly good at recognising spatial patterns in data. Whilst classical (‘fully connected’) neural networks link every input signal to every neuron, CNNs employ a different strategy: they work with local receptive fields, weight sharing and convolutional filters. This enables them to efficiently recognise structures such as edges, textures or complex shapes in images or other spatially organised signals.

The key difference from conventional neural networks is therefore that CNNs explicitly exploit spatial neighbourhoods. They process signals not globally, but layer by layer and locally – much like biological visual systems, in which neurons also respond only to limited sections of the visual field and extract specific features from them.

13.2 Historical origins

The basic idea behind CNNs did not originate in computer science, but was directly inspired by neurobiology. In 1979, the Japanese researcher Kunihiko Fukushima developed the Neocognitron, a model explicitly modelled on the visual cortex that already contained central elements of modern CNNs. It was the first artificial system to combine feature detectors and hierarchical processing.

In the late 1980s and early 1990s, Yann LeCun and colleagues took up these ideas and combined them with the backpropagation learning method. This gave rise to the famous LeNet architecture, which was first used successfully in practice for the automatic recognition of handwritten digits. This marked the beginning of the triumphant advance of CNNs in modern AI.

Why CNNs are relevant to neuroscientists

CNNs are of interest to neurologists and brain researchers because they offer an algorithmic perspective on how sensory brain areas function. Many principles of CNNs – local filters, hierarchical feature extraction, pooling – are found in strikingly similar forms in the visual system of vertebrates. CNNs are therefore not only a tool of AI, but also a model for biological information processing.

13.3 The emergence of biological CNNs in evolution

In the vertebrate brain, we find all classes of neural networks. The emergence of fully connected neural networks (KNNs) in the reticular formation was described in the previous chapter. The development of convolutional neural networks (CNNs) in the brain occurred gradually through a whole series of intermediate stages, resulting from the sequence of evolutionary developments in the vertebrate brain. The starting point was the neural competition between the two halves of the body in bilaterally organised vertebrates. Over a prolonged period of development, this led to the formation of the cerebellum, which was subdivided into the vestibulocerebellum, spinocerebellum and pontocerebellum. The growth of dendritic fields in the retina and cerebellum then led directly to the formation of CNNs in the vertebrate brain.

13.3.1 Neural competition between the sides of the body and the signal pathways of the input stage

Initial situation: bilateral signal input

The motor medullary nucleus of the brain, the formatio reticularis (or its precursor), was located in the input stage of the brain.
Median nuclei supply control signals for the body’s autonomic life-support system.
Each side of the body supplies its motor and sensory signals to the reticular formation, which was bilaterally organised.
This results in two parallel signal streams from the two sides of the body, which are initially processed independently of one another.

Nucleus olivaris

Neuronal competition developed between the two sides of the body at a relatively early stage.
The olivary nucleus – also present bilaterally – acts as a cross-linking nucleus, facilitating communication between the two sides of the body.
It ensures that signals from the left and right sides do not remain isolated, but can compete with one another.
This competition is functionally necessary: it ensures that the stronger signal from one side of the body inhibits the weaker signal from the other side and thus prevails. Only in this way was it possible to hunt for prey or flee from predators.
Initially, the inhibition of the opposite side was limited to the mean signals. It was only in the course of evolution that other signals could be incorporated.
This inhibition was achieved by the contralateral signals terminating on inhibitory interneurons, which in turn inhibited the ipsilateral output neurons. These interneurons developed into Purkinje cells, which, in the course of evolution, separated from the reticular formation and went on to form the cerebellar cortex.

Inhibitory Purkinje cells

The olivary nucleus transmits signals from the opposite side of the body to the Purkinje cells.
In the reticular formation, Purkinje cells inhibit the ipsilateral signals.
They mediate this competition by suppressing the signals from the opposite side of the body.
Result: Only the stronger side prevails – a ‘winner-takes-all’ principle at the bilateral level.

13.3.2 The development of the spinocerebellum

The Purkinje cells initially formed a neuronal nucleus, which we can refer to as the Purkinje nucleus. From this, the later cerebellar cortex developed. The Purkinje cells of the developing cerebellum inhibited, amongst other things, the neurons of the reticular formation. Thus, the two average-value nuclei of the two sides of the body were in direct competition with one another. However, the inhibition of stronger average signals was not total, but only relative, as the average signals possessed greater activity. Thus, there was no complete suppression, but rather relative inhibition. This is because the activity on the inhibited side of the body was not allowed to cease entirely. The aim was simply to find a balance between the two sides of the body, in which stronger signals inhibited the weaker ones.

Signals usually represent a primary variable, e.g. muscle strength, the force acting on tactile receptors, or brightness. Here, there is a functional dependence of the firing rate on the primary variable. This can be strictly monotonically increasing or strictly monotonically decreasing.

The relative inhibition of stronger average signals by input signals leads to a reversal of the monotonicity of these input signals. For example, a strictly monotonically increasing signal curve gives rise to a strictly monotonically decreasing signal curve. We refer to this reversal of the monotonicity curve caused by relative inhibition as signal inversion. It became a primary function of the early cerebellum. The inversion of vestibular signals was necessitated by the transformation of the paleovestibular system into the neovestibular system. The associated part of the cerebellum became the vestibulocerebellum.

Motor signals also found their way via the olivary nucleus to the opposite side of the cerebellum. Through this relative inhibition, the signal strength of the opposite side was inverted. Each muscle now relatively inhibited its antagonist on the opposite side. A residual signal remained, which inversely excited the antagonist. Neurologists refer to this mechanism as co-activation: both muscles were tense, one more strongly, the other more weakly. This co-activation enabled a more precise adjustment of joint angles and simultaneously counteracted the effects of gravity. The corresponding part of the cerebellum developed into the spinocerebellum. This created the functional prerequisite for vertebrates to walk on land.

As the number of motor signals from the opposite side flowing into the spinocerebellum increased, the original formatio reticularis disintegrated. The portion that served exclusively for signal inversion developed into an independent nucleus – the cerebellar nucleus. This further differentiated depending on the origin of the signals. The nucleus fastigii processed vestibular signals. The nucleus globosus and the nucleus emboliformis processed motor signals. The average signal required for signal inversion originated from the reticular formation, which continued to serve the autonomic life-sustaining system. This average signal input is later lost in the pontocerebellum.

The vestibulo- and spinocerebellum initially served exclusively to invert signals from the opposite side.

We group the vestibulo-cerebellum and the spinocerebellum together as the inversion cerebellum. Its main task initially consisted of inverting the contralateral signals to co-activate the motor antagonists. Due to the signal divergence that set in later, the spinocerebellum developed into a kind of digitisation circuit, in which the signal strength of a signal was represented as a scale value and converted into a sparsely coded signal vector, whose only vector position different from zero represented the signal strength. This resembles a one-hot vector, although this is additionally normalised.

13.3.3 The development of the pontocerebellum

The distinctive topological feature of the pontocerebellum is that its signals originate in the cortex. Whilst the spinocerebellum and vestibulocerebellum derive their inputs primarily from the spinal cord and vestibular system, the pontocerebellum receives its signals via the pons from virtually all cortical areas. These corticopontine pathways form the largest projection tract in the brain and make the pontocerebellum the primary entry point for cortical patterns.

For the spinocerebellum, the following applies:

· Input from the contralateral side originates from the local red nucleus and travels via the olivary nucleus into the climbing fibre system.

· Input from the mossy fibre system comes from the ipsilateral half of the trunk.

For the pontocerebellum:

· Input from the contralateral cortex comes from the local nucleus olivaris into the climbing fibre system. The nucleus ruber is not involved in the signal pathway.

· The mossy fibre input comes from the contralateral cortex via the pontine nuclei.

Unlike the older cerebellar structures, the pontocerebellum is decoupled from the averaging function of the reticular formation. It generates the averaging signals necessary for its function autonomously from its own input signals. Due to the great spatial distance, there is no longer a direct connection to the reticular formation. This independence marks a decisive step in evolution:

The pontocerebellum paved the way for the higher intelligence of primates and humans, whilst this structure is presumably only rudimentary in other species.

Nevertheless, there is a crucial difference compared to the inversion cerebellum:

climbing fibre signals and mossy fibre signals have the same signal origin.

In particular, the mossy fibre signals were a kind of signal copy of the climbing fibre signals. In the inversion cerebellum, the climbing fibre signals came from the opposite side. This is precisely no longer the case in the pontocerebellum.

Neurologists and AI researchers should gradually get used to the idea that in the vertebrate brain, particularly in primates and Homo sapiens, there are a total of four signal copies arising from the cortical signals:

· The projection to the ipsilateral nucleus olivaris and from there to the climbing fibre projection of the contralateral cerebellum (climbing fibre projection)

· The projection via the pontine nuclei to the mossy fibre system of the contralateral cerebellum

· The projection into the basal ganglia system (matrix and striosomes)

· The projection into the Papez circuits of the limbic system.

The climbing fibre projection and the mossy fibre projection to the pontocerebellum formed the basis for pattern recognition in neural networks (CNNs).

Each climbing fibre terminated at a Purkinje cell assigned to it and, when active, excited that cell quite strongly. Its axon climbed up the axon of the Purkinje cell, forming numerous synapses there which were capable of strongly exciting the Purkinje cell when a signal was present from the climbing fibre. Similarly, the mossy fibre copy associated with the climbing fibre, which carried the same cortical signal, could excite this Purkinje cell by contacting granule cells whose parallel fibres had an excitatory effect on the Purkinje cell when a cortical signal was active on this pathway. The Purkinje cell used GABA as a neurotransmitter and, when active, inhibited an associated output neuron of the dentate nucleus.

In addition, the climbing fibre and its mossy fibre copy excited precisely this output neuron of the dentate nucleus. This was the cerebellar nucleus of the pontocerebellum. Since the excitation of the Purkinje cell was caused by the same signals as the excitation of the dentate neuron, there was no output in this case. Excitation and inhibition were of equal strength.

Differential circuit in the pontocerebellum:

A differential circuit was implemented in the cerebellum, in which the difference in excitation between the dentate neuron and the Purkinje cell determined the strength of the output signal. If the excitation of the Purkinje cell was stronger than that of the dentate neuron, there was no output, as neurons do not have negative firing rates.

High excitation of a Purkinje cell meant that there was virtually no output. This is precisely the inverse mechanism.

An output could only occur if the remaining mossy fibres, which were not copies of the climbing fibre signal, exerted an inhibitory effect on the Purkinje cell. To this end, this mossy fibre contacted granule cells. Their axons ascended to the Purkinje cell layer, where they formed the long T-shaped parallel fibres. These made contact with stellate cells and basket cells, which in turn inhibited the Purkinje cells. As a result, the reduced excitation of the Purkinje cell was no longer sufficient to completely inhibit the dentate neuron. A residual signal remained, the intensity of which increased with the strength of the mossy fibre input.

Through Hebbian learning, the synaptic strength was increased when the climbing fibre and the mossy fibres were active simultaneously. In this way, the Purkinje cell was able to learn the most frequent cortical signal. As the climbing fibre grew in length, it could make contact with several Purkinje cells, enabling this neural network to learn several different cortical signals, exactly one per Purkinje cell.

In this context, it proved useful for the Purkinje cells to be able to evaluate signals from further away. Thus, the process of enlarging the dendritic trees of the Purkinje cells began. This was to have significant implications for the type of biological neural network in the cerebellum.

13.3.4 Growth of the dendritic fields and the onset of the convolution algorithm

This convergence of more distant signals onto the Purkinje cells forms the evolutionary substrate for the subsequent emergence of a convolutional mechanism.

In the visual cortex V1, the dendritic fields of the pyramidal cells began to expand, developing into magnocellular pyramidal cells. In the pontocerebellum, exactly one Purkinje cell was assigned to each magnocellular pyramidal cell from V1, which was excited by that pyramidal cell. The associated dentate neuron was also excited by this signal.

At the same time, the output signal from precisely this pyramidal cell reached both this Purkinje cell and the associated dentate neuron via the mossy fibre system; both were excited.

On balance, excitation and inhibition cancelled each other out, as they (initially) originated from the same signal source. In fact, this neural differential circuit would serve no purpose at all. However, the dendritic trees of the involved neurons in the cortical cortex and the cerebellar cortex continued to grow in size unevenly.

As their dendritic trees grew, the Purkinje cells surpassed the dendritic fields of the magnocellular pyramidal cells in the cortex. Consequently, a Purkinje cell with its very large dendritic tree was also able to receive signals from neighbouring cortical pyramidal cells. These competing signals had an inhibitory effect on the Purkinje cell, as there was a neural competition between the dendritic fields in the cortex. This carried over to the signals from the cerebellum. Each Purkinje cell was excited by the signals from its own cortical pyramidal cell, but inhibited by those from neighbouring pyramidal cells. However, only the signal from its own pyramidal cell excited the dentate neuron.

The inhibition occurred via stellate and basket cells, which in turn were excited by those parallel fibres that received signals from the neighbouring cortical pyramidal cells via the mossy fibres. This is why the axons of the parallel fibres and the basket cells were so extremely long.

A residual signal remained, which could become strong when the neighbouring fields in the cortex were strongly excited. This was the beginning of a convolution algorithm: the Purkinje cell could (for example) receive the output from nine cortical fields – like a 3×3 mask superimposed over the visual image.

We divide the signals from these nine fields, which form a mask, into inner signals and outer signals. The inner signals originate from the central field, the outer signals from the peripheral fields. We can then formulate the signal algorithm of the pontocerebellum more precisely:

· The internal signals excite the Purkinje cell via the climbing fibre and, with the help of the parallel fibres, via the mossy fibres; they also excite the associated output neuron in the dentate nucleus.

· The external signals inhibit the Purkinje cell via the mossy fibres with the help of intervening parallel fibres, star cells and basket cells.

· Exactly then and only then, when the internal signals excite the Purkinje cell and the associated dentate cell and external signals inhibit the Purkinje cell, an output arises in the dentate neuron, because the balance between the internal excitation of the dentate neuron and the internal excitation of the Purkinje neuron is disrupted. This corresponds to a conjunctive link between the internal signals and the external signals. This signal conjunction distinguishes biological networks of the cerebellum from artificial networks.

· Thus, a differential circuit was realised in the cerebellum, in which the excitation difference between the dentate neuron and the Purkinje cell produced the output signal

This resulted in an explicit anatomical two-channel structure in the pontocerebellum:

· Purkinje and dentate as a coupled comparison unit

· Internal/external division of the input

Conjunction via structure, not just via activity

· not just ‘many inputs → one non-linearity

· but: ‘a structurally divided pattern + shift in balance’

CNNs and MLPs have:

· linear filters + scalar non-linearities

· no Purkinje–dentate pairs as explicit comparison units

· no in/out difference structure per se

They can approximate something similar, but not in this explicit, biophysically wired form.

AI systems:

· work with continuous functions and statistical approximation,

· they lack this explicit difference circuit and the clear in/out separation,

· they merely ‘approximate’ such a conjunction, particularly with large amounts of input.

The input arriving at the Purkinje cell represents a particular visual pattern.

Through Hebbian learning, the most frequent visual pattern from these nine eye-dominance columns was imprinted in the synapses of the Purkinje cell. The output reached the dentate nucleus ( ) and, via lateral inhibitory connections, led to the suppression of neighbouring dentate neurons. In this way, the Purkinje cell learned the most frequent visual pattern. In addition, each output neuron of the dentate nucleus inhibited the corresponding climbing fibre signal in the olivary nucleus – and thus also the input signal from the cortex. Consequently, another Purkinje cell could no longer perceive this signal.

If a visual pattern was not recognised, the climbing fibre remained active in the olivary nucleus and produced neurotrophic growth factors in the pontocerebellum. These factors promoted the maturation of proneurons into Purkinje cells. Thus, over the course of evolution, further Purkinje cells emerged to which the same climbing fibre could attach. These new Purkinje cells were able to learn the next visual patterns in order of their statistical frequency. Later, LTD and LTP accelerated Hebbian learning, so that new visual patterns could be learnt within seconds.

The more Purkinje cells that could be reached by a climbing fibre, the more different patterns the CNN could learn, which conferred an advantage on the vertebrate. In principle, the Hebbian learning process followed Sanger’s rule, albeit with some modification. This is because the input signals were not symmetrical about the zero point, as there are no negative firing rates. Consequently, the Purkinje cells did not learn the principal components, but rather the signals in order of their statistical frequency.

The convolution algorithm, which resulted in translation invariance of the patterns for CNNs, is emergent in the cerebellum and does not need to be explicitly implemented. A major advantage of biological CNNs in the brain is the parallel operation of all output neurons.

13.4 Second main theorem for biological neural networks

The first convolutional networks arose through the comparison of the different excitations of the Purkinje cell and the associated dentate neuron via a difference circuit. The disproportionate growth of the dendritic fields of Purkinje cells enabled the integration of signals from neighbouring cortical fields, resulting in a structural convolution operation that functionally corresponds to a convolutional neural network, yet does not require an explicit convolution algorithm.

This also explains why people who, from birth, lack Purkinje cells and cerebellar cortex are nevertheless hardly impaired. As long as their cerebellar nuclei are present, these function as neural networks and can thus perform the necessary motor and cognitive tasks.

13.5 Methodological aside: Neuroanatomy as an archaeological search for clues

The current organisation of sensory pathways in the vertebrate brain is not only a functional outcome but also an evolutionary archive. Many anatomical features – separate thalamic layers, crossed and uncrossed visual pathways, parallel projection pathways via the pons and olive, topological maps in the cerebellum – are not random constructions. They are remnants of earlier stages of development that have been preserved over millions of years because they were embedded in the fundamental architecture of information processing.

These structures can therefore be interpreted like archaeological finds: as preserved evidence of algorithmic principles that were already at work in early vertebrates. e separation into ipsilateral and contralateral sublayers in the thalamus, the different transmission via mossy and climbing fibre systems, and the precise topological organisation in the pontocerebellum form a chain of evidence pointing to a common functional origin. They demonstrate that the brain not only processes signals but also utilises computational principles that have gradually emerged and been refined over the course of evolution.

In this sense, neuroanatomy serves as an archaeological window into the past of neural signal processing. Its preserved patterns allow us to reconstruct hypothetical lines of development and plausibly trace the emergence of complex mechanisms – such as the convolution operations in the pontocerebellum.

13.6 Addendum at the end of the chapter

This raises the question of whether individuals without a cerebellar cortex could possess CNNs in the cerebellar nuclei. The absence of Purkinje cells is equivalent to the loss of access to several neighbouring dendritic fields of the cortex. However, since the cortex generally also projects into the cerebellar nuclei via the mossy fibre system, this deficiency could be compensated for. To achieve this, the dentate neurons would need to expand their dendritic fields so that input from neighbouring cortical dendritic fields can also be received. An increase in the size of the dendritic fields in the dentate nucleus in individuals without a cerebellar cortex could be an indication that the brain is indeed able to compensate for this absence, at least to some extent.

All the circuits, modules and neural networks described so far still belong to the primary brain system. This subsystem is the oldest in evolutionary terms and processes analogue signals. The signal strength of primal quantities perceived via receptors is encoded by the firing rate.

Neural networks were able to form within the primary system because, on the one hand, averaging systems such as the reticular formation provided the technical foundation. On the other hand, the larger receptive fields in the cortex and cerebellum created the conditions for signal analysis in convolutional neural networks (CNNs).

The next evolutionary step was the emergence of vertical signal divergence in the olivary nucleus, as well as the planar or spatial divergence modules in the cortex. These developments led to the emergence of a new class of neural networks, referred to in AI terminology as transformers. They are described in detail in a separate chapter.

For visual pattern recognition, it is important to note that CNNs require a cortical projection to the olivary nucleus. The cortico-olivary tract has been clearly demonstrated in humans. In lower vertebrates, it is presumed to exist as well, though it remains numerically insignificant. Whilst humans possess around 5,000 eye-dominance columns for the visual CNN, lower vertebrates might manage with as few as 100 columns. Their axons would be barely noticeable in the brain.

The circuits and networks described so far belong entirely to the primary brain system. Within this system, the receptive fields of the ganglion cells already respond to periodic changes in brightness caused by partial occlusion. These fields are capable of processing simple fluctuations in signal intensity and thus form the basis for visual pattern recognition in CNNs.

The later orientation columns in the cortex, however, are a newly formed neural structure that does not yet exist at the stage of development described here. They only emerge millions of years later in evolution and already belong to the next subsystem of the brain. Their function – the periodic fluctuation in signal strength when a straight line is rotated within the field of view – is a hallmark of the Transformer architecture, which is described in later chapters.

The mathematical description of the periodic changes in brightness caused by the partial occlusion of a circular receptive field is presented in detail in Chapter 13 of my theory of the brain. For the purposes of understanding, it suffices here to note that the chord length of a straight line within a circle depends on its distance from the centre. This simple geometric relationship explains the periodic fluctuations in signal strength, which later lead to the formation of the orientation columns.