How "Self-Communication" May Bootstrap Cognition

Dylan R. Cope · August 10, 2020

Communicative Autostimulation

In his 1991 book, Consciousness Explained, Daniel Dennett sketches a hypothetical, “just-so” story for the evolution of the “internal voice”. We start by considering some early ancestors of ours on the verge of developing language. These early Homo sapiens would have been endowed by natural selection with some basic communicative instincts, such as making vocalisations or gestures to indicate such sentiments as “There’s No Food Here”, or “Danger Ahead”. Such talents have already been noted in other hominid species, so it is not a stretch to suppose that we also started with something like this. Dennett asks us to speculate that perhaps when one of these protohumans were stuck on a project, say cracking open a nut or searching for an item, they could “ask for information” from another member of their community. For this practice to be of any use, there would need to be the codevelopment of another talent: answering these calls for information.

One fine day in this imaginary world, we are invited suppose that one of these creatures, being clumsy and irrational, makes an information-seeking vocalisation when no-one is around to hear… except themselves! Upon hearing their own utterance, they are overcome with the desire to answer their own question. A peculiar occurence, but to the creature’s delight they have found what they needed to find. They have probed themself for information by using their protolanguage as a mechanism for getting different parts of their own brain to talk to one another. The following figure, taken from Dennett’s book, shows this information transveral graphically. Knowledge about X is transfered to a disconnected part of the brain through the speech act.

Dennett goes on show us how such “crude habits” of autostimulation might get baked into the genetic code of the species by virtue of the Baldwin effect: individuals who autostimulate are able to solve more problems than those who don’t, and thereby incur a fitness boost. This means that when autostimulation becomes a learned habit, those who are not predisposed to the behaviour are at a disadvantage and will not be expected to last for much longer. As the arms race continues some might stumble upon the behaviour of making the vocalisations more and more quietly so as to strategise privately and outsmart other members of their species. There is even some evidence of a progression like this happening for humans in the form of vestigial movements of the vocal cords when people think words.

Meaning and Communication

To investigate this further, lets look at two different communication schemes and see what their implications are for autostimulation:

  • Direct communcation: This is the kind of one-to-one communication through a channel that is typical of information theory and digital signal processing.

  • Stigmergic communication: This is a term borrowed from the study of insects by researchers in swarm intelligence. It broadly refers to communication through the environment. For example, an ant leaving a trail of pheromones to communicate the path that it is travelling to other ants.

The natural reaction to this distinction is to highlight stigmergy as the kind of formalism that allows for communicative autostimulation, as once the agent’s message is in the environment it is able to be observed by the agent itself. Dennett gives another example in his book of autostimulation via drawing diagrams (that is later baked into reasoning via mental imagery). However, the primary example, speech acts are examples of direct communication. This may seem mistaken as vocalisations travel through the medium of air (i.e. the environment) so should not they be considered stigmergic? The problem is that speech acts themselves never “linger in the air”; only very momentary sound fragments do. As we do not interpret individual sound fragments as having any meaning we cannot say that the communicative act is stigmergic in any sense. In other words, we can say that communicative acts are stigmergic if and only if they persist in the environment as a coherent wholes that “carries meaning”. But what do we mean by “carries meaning”?

Suppose that we have two agents communicating with one another by taking turns leaving red or blue coloured blocks in a room. At first glance, this seems stigmergic - they are using the environment to send messages. However, there is an important factor to consider regarding the complexity of other features of the agents’ environment outside the room. If the agents live in a world with only two objects that they wish to discriminate between then a single block could carry any meaning, in the sense that observing a particular block can tell one of these agents everything they wish to know. This is because a single block has the potential to map to all distinct objects in the environment. Red for one object and blue for the other. In broader terms, by the phrase “an observation carries meaning for an agent” we mean that the observation triggers the reaction in the agent of narrowing its focus on a particular region of its ontology. Ontology is the study of “the nature of existence”, but Dennett uses the term “an agent’s ontology” to refer to the set of things that the agent themselves perceive as existing. Through this we can see that the phrase “the communicative act carries meaning” is incomplete as it frames the meaning as some agent-independent property of the act. Afterall, examples such as the word “pain” carrying the meaning of “bread” for a French person make this fact so painfully clear that it almost seems ridiculous bothering to say it! Regardless of this caveat, we will continue to use the phrase as a useful shortcut.

Great, so we have established that this red-blue blocks communication scheme stigmergic - didn’t we already know that? However, if the agents live in a world with more interesting objects, a single block will no longer cut it. If there are four objects, they will need to use two blocks to create such a mapping, e.g.: blue-blue for object 1, blue-red for object 2, red-blue for object 3, and red-red for object 4. As such, we have moved away from stigmergic communication and towards direct communication.

But wait, in the example with four objects, its not as if the individual blocks carry no meaning! If you recieve a blue block first, you can immediately notice that your interlocutor is focusing on the “object 1, object 2” superobject. This blue block allows us to discriminate between two superobjects - it has a similar “mapping property” as in the case with two objects. Therefore, this communicative act is (1) through the environment, and (2) carries meaning as a whole, and so quod erat demonstrandum, must also be stigmergic! Are we barrelling towards the conclusion that there is no distinction between stigmergic and direct? Perhaps, but a potential remedy is that we could argue for some kind of “temporal invariance” condition on the mapping between the communicative acts and the objects of the world. The blue block “points towards” the “object 1, object 2” superobject when it is observed first, but it points to either object 1 or object 3 when it is observed secondly.

However, if we exit this thought experiment and examine real language we can see that this too breaks down. Consider the word “cat”. If we decompose it into the letter “c”, “a” and “t” when can see that individually they are like the red and blue blocks. If you recieved them one at a time through a window, when you recieve the “c” you could narrow down the word to “words that start with c” (i.e. a strange variant of Hangman). But if you found a “c” elsewhere you would be doing quite a different narrowing down of the possible words. Hence, the temporal invariance property is broken and the letter “c” isn’t said to carry any meaning. So far, this probably roughly gels with most peoples intuitions. But lets push a little further. Returning to the word “cat”, we would certainly want to say that it carries meaning. After all, it neatly partitions the world into cats and not-cats. Yet, consider how the word can be used in a phrase, such as “my cat” or “the cat in the hat”. The position of the word relative to other words augments the meaning: we transition from refering to the superobject of all cats to particular cats. This isn’t really surprising; if the temporal invariance property was the defining feature of “meaning” then grammar would be pointless and we could communicate effectively with just bags of words.

At any rate we have method for determining a “degree of meaning” for a communicative act, namely the amount of information it transmits in terms of the recieving agent’s ability to make finer and finer discriminations in its environment. This then translates to two factors that define more direct or more stigmergic communication: the “persistence” of the act in the environment (how many opportunities are there for it to be observed), and the amount of meaning “carried with” in the persistent object. Ant pheromones dissapate, vocalisations propagate away at the speed of sound, and books deteriorate. But books and pheromones persist longer than instanteous vocalisations and carry more meaning to their interpreters.

Insights for AI

Despite Dennett’s sketch being fabricated, it at least presents a plausible mechanism by which an agent might gain capabilities. As AI developers, it is always a good strategy to open up as many such paths for our systems to become more intelligent. Many existing “learning to communicate” approaches in machine learning, such as Foerster et al. (2016), model the communicative acts in terms of a channel whereby one agent sends a message that the other recieves (i.e. direct communication). In other words, there is no chance for the kind of autostimulation that we have been discussing. However, there is autostimulation in the sense that these models are recurrent neural networkds (RNNs), which means that at each timestep the agent can “send” activations, called mental states, to themselves in the next timesteps. These activations provide a flow of information from past observations to current actions. So if we already have autostimulation in this form, why should we expect “vocal” autostimulation to provide any benefit? Is this simply a case of over enthusiam for being inspired by biology? Maybe, but lets indulge anyways. Here is what communicative autostimulation might look like diagrammatically:

In this figure we are representing the agent as a composition of two functions, f and g, shown as blue rhombi. The diagram above the dotted line represents the agent at time t and the diagram below the line is the agent at t+1. The agent recieves observation o_t at time t, then produces mental state z_t = f(o_t, z_{t-1}), and computes an output action a_t = g(z_t). The information flow from the mental state to the next action (and thereby from the observation to the next action) is shown by the arrow from the upper purple box to the lower one. In order to show what communicative autostimulation looks like, we draw an arrow from from the actions a_t to the observations at the next timestep o_{t+1}. From this new flow of information we can see that there are parts of the agent that were previously not connected together, which may be useful: features extracted by g are now potentially accessible to computations done in f.

So it is at least plausible that the agent may recieve some basic benefit this new information flow, but there is still another reason that we might expect even more benefit. As shown in the diagram above, the process of going from observations to actions is one of dimensionality reduction and discretisation. In particular, when the action is communicative it is “carrying” semantic information that is interpretable by your interlocutor. In other words, the features are organised in such as way that we should expect are easier for agent to extract. Therefore, the additional pressure on the speech act as needing to be communicative encourages the agent to form generally useful representations the information. Hence, we should expect that autostimulation could have similar effects as described in Dennett’s tale.

Through this investigation we’ve found some degrees of variation that may be interest to experiment with: more stigmergic versus more direct self-communication. An important part of whether or not autostimulation in AI will be effective is the amount of meaning that is convey in the self-communicative acts. In Dennett’s example above of drawing diagrams, we saw a highly stigmergic autostimulation that is so good at conveying meaning that I’ve employed it serval times in this post. Yet in the example of direct vocalisations we see that in complex environments there is a prior necessity for agents to have ability to quickly aggregate successions of symbols into larger blocks of meaning (“c”-“a”-“t”… “cat”!). If an AI system simply observes the “c” that it had uttered in the previous timestep it hard to see how it would be able glean much useful from that autostimulation.

Yet, we could make the same argument regarding the agent only observing the fleeting parts of speech acts that come from the other agent. If two agents are able to learn to communicate over a direct communication channel, then learning to autostimulate across such a channel should also be possible. However, a stigmergic channel alongside the direct one could lead to the eventual emergence of more interesting self-stimulation, and studying the emergence of stigmergic-only communication in learning systems is interesting in its own right. In conclusion, opening paths of communicative autostimulation could potentially enhance the autocurricula (the automatically generated sequence of progressively harder challenges) that a multagent system uses to develop greater and greater capabilities.

Twitter, Facebook