Relational RL
Last updated
Last updated
TL;DR: RRL combine Reinforcement Learning with Relation Learning or Inductive Logic Programming by representing states, actions and policies using a first order(relational) language.
We apply a CNN on the raw image and get a feature map: where we have one $k$-dimension vector for each pixel( is the number of output channels of CNN), the vector is then concatenated with and coordinates to indicate its position in the map.
We treat the resulting pixel-feature vectors as the set of entities.
The output is then passed to the relational module, where we iteratively apply "attention block" on entity representations:
The attention block is the same as a Multi-Head Attention module in Transformer.
Action: left, right, up, down.
pick up keys and open boxes(two adjacent colored pixels)
Agent could pick up loose keys(isolated colored pixel) and open boxes with corresponding locks.
Most boxes contain keys in then, and one box contain a gem(colored with white), the target of the game is to reach the gem.
The output of relational module is aggregated via a feature-wise max-pooling across space( tensor to a -dimensional vector), the feature is then used to produce value and policy for Actor-Critic Algorithm.