Deeply Interactive Learning Systems

By David I. Spivak and Timothy Hosgood

Category theory can be described as “mathematics for making analogies precise.” We can use it to replace vague-sounding comparisons with encompassing mathematical structures. Here we discuss a specific example of this phenomenon.

An analogy is often made between deep neural networks (DNNs) and actual brains, suggested by the nomenclature itself: the “neurons” in DNNs should correspond to neurons (or nerve cells, to avoid confusion) in the brain. However, we claim that this analogy does not even type check: it is structurally flawed. In agreement with the slightly glib summary of Hebbian learning as “cells that fire together wire together,” we make the case that the analogy should be different. Since the “neurons” in DNNs manage the changing weights, they are more akin to synapses in the brain; it is instead the wires that are like nerve cells, in that they enable information flow. But nerve cells are more than mere wires — they are dynamical systems; we will explain this further throughout the article. We will also continue to highlight the error in equating artificial neurons in a DNN with nerve cells by leaving “neuron” in quotations or using the term “artificial neurons” instead.

We will first explain how to view DNNs as nested dynamical systems with a very restricted sort of interaction pattern, then explore a more general interaction for dynamical systems that is useful in engineering but fails to adapt to changing circumstances. It is as though two explorers landed on opposite sides of the same continent without knowing it. There is an encompassing generalization of DNNs and interacting dynamical systems that we call deeply interacting learning systems (DILS).

Neural Networks Learn But Are Not Deeply Interactive

The process of training a DNN is as follows:

The user provides an input-output pair (a training datum), or a batch thereof.
The DNN uses its current parameters to push the given input from the start to the end of the network.
The DNN compares the final result of this forward pass against the given output and propagates the error backward through the network, updating its parameters as it goes.
The whole process repeats many times.

Researchers commonly portray DNNs as networks of artificial neurons; information passes in both directions through the wires that connect these “neurons.” Our first slight modification to this analogy is to make the opposite directions of the two passes explicit by doubling each wire (see Figure 1a). Now the wires that run from left to right (labeled \(x_i\), \(y_i\), etc.) correspond to the forward pass, and the wires that run from right to left (labeled \(\Delta x_i\), \(\Delta y_i\), etc.) correspond to the backward pass.

Figure 1. By dragging the wire labeled \(\Delta y\) along the dotted arrow, we can “unfold” our artificial neuron (1a) into an interaction diagram (1b) such that the inputs and output to the artificial neuron turn into interior boxes and an exterior box, respectively. Figure courtesy of Timothy Hosgood.

Next comes the most important mental step in our mathematical analogy. Imagine taking a single “neuron,” along with its input and output wires, and “unfolding” it (see Figure 1). The hierarchy changes from left-to-right to inside-to-outside (resulting in a trivial example of a so-called interaction diagram that we will discuss in more detail), and the multiple layers of artificial neurons thus become nested boxes within boxes (see Figure 2). This sort of nested composition is formalized using operads.

Before proceeding any further, let us take a moment to think about interaction diagrams. We can think of each box as a person who holds an office: the person can make decisions and the office has attendant capacities that allow the person to abstract the information that they receive from the world. In short, the person turns lower-level data into higher-level data. We refer to the person as the abstractor and the function of their office (i.e., how it interacts with the rest of the world) as the abstraction.

Figure 2. In the case of multiple artificial neurons, the unfolding procedure turns left-to-right hierarchy into inside-to-outside hierarchy. Here we indicate the idea via counts and colors. Figure courtesy of Timothy Hosgood.

Now we can understand interaction diagrams for DNNs, e.g., as in Figure 1. There are several smaller abstractors (which we call “workers”) whose offices are in the purview of—and hence pictorially drawn within—a higher-level abstractor (which we will refer to as “\(\textrm{M}\)” for “manager”). \(\textrm{M}\) communicates with the outside world, sending out information and receiving feedback. To do so, it listens to its subordinate workers, trusting them to varying degrees (as described by \(\textrm{M}\)’s weights and biases, which it is constantly updating), and passes along specific feedback to each worker. But the method of sending out information in DNNs is unrefined: it is as though each abstractor merely shouts out the results to its superiors. In proportion to how loudly each worker shouts and according to \(\textrm{M}\)’s current weights and biases, \(\textrm{M}\) constructs an output, which it then shouts out. This whole system can recurse both inwards and outwards, with each worker as the manager of its own team and \(\textrm{M}\) as a subordinate to higher-level managers. In DNNs, the levels of this hierarchy are called layers.

Figure 3. A deep neural network usually corresponds to an interaction diagram with trivial peer-to-peer wiring. Courtesy of Timothy Hosgood.

Of course, a scenario where workers compete in a shouting match to get their manager’s attention does not represent an ideal working environment. Most notably, workers in the same layer of a DNN never talk to each other or collaborate (see Figure 3). Yet this is how almost all neural networks function today; the only thing that an artificial neuron can do is add incoming signals according to weights and biases. Certain loss functions—as seen in physics-inspired neural networks, for example—do make a small amount of headway, but the breadth of what is possible does not seem to be common knowledge within the machine learning community.

To be clear, this lack of peer-to-peer interaction is not remedied by introducing loops into a typical DNN diagram, as in recurrent neural networks. These loops—which come with weight-tying—serve to put an abstractor inside itself, not to create peer-to-peer message passing. We therefore pose the following question: “What if we allowed our DNNs to have internal wiring?” (see Figure 4).

Figure 4. A sketch of our proposed model where nontrivial wiring is allowed (compare with Figure 3). Figure courtesy of Timothy Hosgood.

Open Dynamical Systems Interact But Do Not Learn

Let us now discuss interaction diagrams—a generalization of the wiring or block diagrams that are common in control theory—and how they describe interacting dynamical systems. Although such diagrams gain a great deal of generality by possessing nontrivial wirings, they have no notion of the adaptivity that exists in DNNs.

Consider a computer processor, which consists of many small parts that are comprised of even smaller parts. The adder serves as an abstractor of the logic gates, and the logic gates themselves are abstractors of the transistors. Because some abstractions are more useful than others, researchers have spent a great deal of effort to find the right ones (such as NOT, AND, or OR). The parts interact with each other at each level of the hierarchy to form the higher-level functionality of the processor. But the processor itself forms a smaller part of a larger system, such as a computer or mobile phone, which in turn might combine with others to form networks with their own emergent behaviors.

Figure 5. Construction of the logical \(\textrm{OR}\) gate from \(\textrm{NAND}\) gates. The wires describe the interconnectivity of the system’s inputs and outputs. The blue boxes demarcate (sub)systems, with the \(\textrm{NOT}\) gate as an intermediate abstraction between the lower-level \(\textrm{NAND}\)s and the higher-level \(\textrm{OR}\). Figure courtesy of Timothy Hosgood.

Vague descriptions of how simpler abstractions can combine to form higher-level ones are formalized by the category-theoretic notion of operads. However, we can draw pictures of systems and subsystems using interaction diagrams and leave the category theory hidden behind the scenes; for example, Figure 5 displays the construction of the logical OR gate via NAND gates. But category theory lets us regard these diagrams and their contents as formal mathematical objects and prove things about them. Figure 6 provides an example of this concept and shows that we can embed a certain sort of mathematical object (e.g., systems of ordinary differential equations) into these diagrams and obtain another such object as a result.

In this diagrammatic approach to interacting dynamical systems, the base-level systems change through time but the formalism seems to insist that their interaction patterns do not. For example, the connections would be permanently soldered in the real-world version of the electronic circuit in Figure 5; as a result, the way in which the subsystems interact would never change. Fixed interaction patterns have certainly proven very useful from the birth of modern electronics until present day. But the wirings between systems can often change in real life; field-programmable gate arrays serve as one example, but so do more common scenarios like mobile phones that switch from one cell tower to another. We can even regard human society as a dynamical system wherein the connections constantly change. For instance, two people who might be “wired together” (e.g., talking via video call) at one point will then be “un-wired” (e.g., ending the call) at a later point. In the language of interaction diagrams, we refer to continuously changing interaction patterns as dynamic rewiring.

We previously claimed that DNNs already have the capacity for such dynamic rewiring. Indeed, this phenomenon is inherent in the way that these systems learn: the values of weights and biases change after every batch of training data. Yet despite this strength, they operate as mere shouting matches and lack peer-to-peer collaboration. The next question is naturally: “Can we have the best of both worlds and allow our systems to have (a) nontrivial peer-to-peer wiring that (b) can change and adapt with time?”

Figure 6. Variable sharing allows for the composition of open dynamical systems. Figure courtesy of Timothy Hosgood.

Deeply Interactive Learning Systems: The Best of Both Worlds

We can answer this last question affirmatively with the use of category theory (specifically the formalism of polynomial functors). The two aforementioned generalizations actually give one single structure; that is, an interacting dynamical system with dynamic rewiring and a DNN with peer-to-peer messaging are simply two different descriptions of the same mathematical object. We call this object a DILS. Passing between these two viewpoints allows us to better understand the analogy between them.

With a DILS, the usual discrete partition of the learner into “learning phase,” testing phase,” and “implementation phase” no longer exists. The system is instead continuously online and embedded in an actual world, as is the case with control systems. Prediction error—how well the high-level abstractions actually offer affordances to the system—simply replaces the usual notion of a trainer. From the DNN point of view, “the current collection of weights and biases” generalizes to “the current interaction pattern between components.” Furthermore, these interaction patterns are much more collaborative and cooperative than the simple shouting match that weights and biases describe. They also help us understand the relation between data as it flows through the DNN, as well as abstraction itself; the left-hand layer corresponds to low-level data (i.e., “pixels”), the right-hand layer corresponds to high-level data (i.e., “cat”), and processing data corresponds to the creation of higher-level abstractions (i.e., “pixels become curves and features, which become ears and whiskers, which become a cat”). This concept corresponds to the movement from interior to exterior boxes in Figure 4, whereas the movement from left to right wires corresponds more with the transition from sensory input to motor output.

Interacting dynamical systems with fixed wiring have been extraordinary useful over the last 50 years, but they are inherently static. The power of DNNs explicitly relies on dynamic rewiring, yet it neglects the possibility of peer-to-peer messaging. Category theory allows us to combine the strengths of these two architectures into a single, more general framework while also correcting the structural flaw in the usual analogy between DNNs and brain anatomy.

David I. Spivak presented this research during a minisymposium at the 2021 SIAM Conference on Applications of Dynamical Systems, which took place virtually in May 2021.

Further Reading
[1] Cruttwell, G.S.H., Gavranović, B., Ghani, N., Wilson, P., & Zanasi, F. (2021). Categorical foundations of gradient-based learning. Preprint, arXiv:2103.01931.
[2] Fong, B., Spivak, D.I., & Tuyéras, R. (2019). Backprop as functor: A compositional perspective on supervised learning. In 2019 34th annual ACM/IEEE symposium on logic in computer science (LICS) (pp. 1-13). Vancouver, BC, Canada: IEEE.
[3] Fong, B., & Spivak, D.I. (2019). An invitation to applied category theory: Seven sketches in compositionality. Cambridge, U.K.: Cambridge University Press. (Freely available online at arXiv:1803.05316.)
[4] Mitchell, T., Cohen, W., Hruschka, E., Talukdar, P., Yang, B., Betteridge, J., … Welling, J. (2018). Never-ending learning. Commun. ACM, 61(5), 103-115.
[5] Selinger, P. (2010). A survey of graphical languages for monoidal categories. In B. Coecke (Ed.), New structures for physics. New York, NY: Springer.
[6] Vagner, D., Spivak, D.I., & Lerman, E. (2015). Algebras of open dynamical systems on the operad of wiring diagrams. Theory Appl. Categories, 30(51), 1793-1822.

David I. Spivak has spent the last 14 years at the University of Oregon, the Massachusetts Institute of Technology, and the Topos Institute. He uses category theory to articulate the compositional structure in various domains—from dynamical systems to databases—in a variety of interrelated projects for NASA, the National Science Foundation, and the U.S Departments of Defense and Commerce. Spivak has written three books on applications of category theory.

Timothy Hosgood graduated from Aix-Marseille Université and is a graduate student in category theory and geometry. He has worked with the Centre for Quantum Technologies and the Topos institute and currently holds a postdoctoral position at Stockholms Universitet. Hosgood is interested in the use of mathematics and software development to create better systems for global communication within and between communities.