Virtual agents are rapidly gaining importance in the digital landscape. Intelligent chatbots such as the ones developed at VIQTOR DAVIS are an important example. However, in order to teach a chatbot to generate meaningful language, a lot of scripted dialogues and labelled text corpora are usually required. The next big step is to design virtual agents that are able to autonomously develop a protocol to communicate with humans and with each other.
In this paper, we look at virtual agents who have to complete increasingly difficult tasks in an artificial environment, expressed as natural language instructions (see figure for an example). The default agent architecture accepts the available sensory information and natural language instructions as input to predict the best action at a moment in time, using deep learning. Additionally, we consider agents whose architecture is extended with an unsupervised language generating module. During model optimization, these agents are free to develop a communication protocol, on which we impose no limitations apart from vocabulary size and sentence length.
As it turns out, the language that emerges completely independently can be interpreted in a surprisingly intuitive way. Moreover, when we pair a language-generating ‘expert’ agent with a standard one, the expressions uttered by the expert help the other agent solve unseen games significantly faster. This reduction in training time implies a reduction in computational efforts, and thereby in costs.
In our opinion, the presented results demonstrate the powerful language-generating capacities of deep learning models, even if no direct supervision is available. Furthermore, the combination of interpretability and reduced training costs demonstrates the value that intelligent use of language can add to deep learning modules. At present, techniques for autonomous language generation are not mature enough for large-scale industrial applications - but maybe in a few years, chatbots can learn to speak a language completely by themselves.
Read the complete in-depth paper here; Mastering emergent language: learning to guide in simulated navigation.
Screenshot for sample task ‘put the green box next to the green ball’. The red arrow represents the virtual agent and the shaded area the currently observable part of the environment.