A major problem faced by an RL agent is how to determine the relevant states and actions in the first place: when faced with noisy sensory information from the world, how does the agent determine the
relevant features that constitute a state, and then identify what are the relevant actions in that state? 27, 28 and 29]. This problem is essentially one of perception and sensorimotor learning, as it depends on the capacity to segment and identify relevant objects, contexts MK 1775 and actions 30, 31, 32 and 33]. One approach to this problem involved setting up an experimental situation in which a given stimulus has multiple dimensional attributes (e.g. shape, color, motion). Inspired by earlier cognitive set-shifting tasks 34 and 35], one of these dimensions is unbeknownst to the participant, selected to be ‘relevant’ in terms of being associated with a reward, and the goal of the agent is to work out which attribute is relevant, as well as to work out which exemplar within an attribute (e.g. a green color vs a red color) is actually reinforced 36 and 37]. Bayesian inference or RL can then be used to establish the probability
that a particular dimension is relevant, which can then be used to guide further learning about the value of individual exemplars within a dimension. The ability to construct a simplified representation of the environment focused only on essential details reduces the complexity ABT 199 of the state-space encoding
problem. One way to accomplish this is to represent states by their degree of similarity to other states either via relational logic [38], transition statistics [39•] or feature-based metrics 40 and 41]. Furthermore, generalized state-space representations can speed up state-space learning considerably by avoiding the time cost of re-learning repeated environmental motifs (if I learn how to open my first door, I can Silibinin generalize this to all doors). RL agents ‘in the real world’ can suffer from a dimensionality problem in which there are too many states over which to integrate information to make decisions let alone learn [42]. It has been proposed that state-space structures be compressed in order to make calculations tractable. In particular, multiple actions (and their interceding states) might be concatenated into ‘meta-actions’ or, more generally, ‘options’ [43]. Decision policies would be developed over these options rather than individual actions thus reducing the computational complexity of any policy-learning algorithm.