The Two Sides of the Doors of Perception
Vision in everyday waking consciousness and the psychedelic state

This article by Shayam Suseelan first appeared in Psychedelic Press XXXVIII.
Can we see everything as it is, or do we make a best guess about reality? A computational and cognitive neuroscience framework of visual perception called the predictive processing (PP) model has roots in the latter conjecture. The top-down model supposes perception to be a result of an inferential hierarchical process between distinct neuronal subpopulations. This includes neurons that predict sensory information as well as those that signal deviations or ‘errors’ from current predictions. Individuals with psychosis, however, appear to be defective in this process. In this regard, the present review will also consider the mode of perception in these individuals through studies utilising psychedelics.
The relaxed beliefs under psychedelics (REBUS) model is used to highlight this alternative mode of perception. REBUS proposes that psychedelics help to relax heavily weighted predictions or beliefs and thus give rise to a bottom-up cascade of sensory influx. This can be applied in a real-world setting through therapy, whereby psychedelics aim to dismantle pathological beliefs or delusions as seen in psychosis. Both these models of perception have shown promising results as outlined by the neurophysiological studies but are limited by methodological and theoretical constraints. These issues are considered and suggestions for future work are also provided.
Introduction
‘It’s not what you look at that matters, it’s what you see.’
- Henry David Thoreau (1854), Walden
Perception is the process of making sense of a stimulus. The classic modes of percepts include touch, taste, smell, sound, or sight; the latter will be the focus of this review – although it must be noted that this is an abstraction; there are other senses working on an everyday basis that often go unnoticed (e.g., proprioception[1]). The ability to perceive and become aware of everything is what forms the basis of reality, thus making us conscious of our surroundings.
Helmholtz proposed that perception is the result of inferential processes of previous sensory stimuli rather than present sensory inputs.[2] The current review highlights a contemporary model of perception called ‘predictive processing’[3] which takes inspiration from Helmholtz’s work. The review also considers the perceptual abnormalities of individuals with psychosis. The relaxed beliefs under psychedelics (REBUS) model is evidenced to highlight these irregularities.
The implications of the REBUS model in a therapeutic setting are also considered. While both models emphasize the cognitive mechanisms of perception, they are limited by a lack of specificity to the model’s hypothesis and with regard to this, future directions of these theories are outlined.
The Neural Correlates of Perception & Predictive Processing
The earliest accounts of perception can be dated to around 460 BC when the Pre-Socratic philosopher, Democritus, believed the sense of sight to be due to ‘eidolons’; these were small copies of real objects which consisted of minuscule, invisible particles called ‘atomos’ that fly into the eyes.[4] We now know perception to be due to light waves hitting the retina at the back of the eye.[5] The retina contains photoreceptors which convert light into electrochemical signals.[6] These signals are transmitted to the visual cortex in the brain via the optic nerves which reinterpret sensory stimuli as images. The retina-geniculate-striate (RGS) pathway connects the eye and the cortex to form vision .[7],[8]
At the cognitive level, perception results from both top-down and bottom-up processing; the former is based on previous experiences, predicting the stimuli, and therefore creating a reality check of the world while the latter considers new sensory information and makes sense of this information.[9] Traditional top-down ‘feedforward’ models argue perception is constructed by a sequence of spatiotemporal filters that progressively refine complex features of the stimulus as it rises through the hierarchy of the visual cortex.[10] An overarching, neurobiological and computational framework called predictive processing (PP)[11] has emerged to contend the traditional feedforward models of visual perception.
PP is a multi-level ‘generative model’ which proposes that ‘perception involves the use of a unified body of acquired knowledge to predict the incoming sensory barrage’.[12] At the neuronal level, information flows hierarchically from the visual cortex (V1) to the secondary visual cortex (V2), the visual areas (V3, V4 and V5) until the information reaches the higher-level inferotemporal cortex. The discrepancy between higher and lower-level predictors are updated as ‘error signals’. Prediction of sensory input takes place through an iterative hierarchical inferential process, using prediction and error units at each level. According to the PP hypotheses, the prediction units determine what is likely to be expected at the preceding level of the visual cortex. Error units on the other hand determine deviances between sensory expectations and sensory input; these prediction errors communicate this information upward through the hierarchical structure to form revisions about predictions and reduce visual ambiguities.[13]
While predictions about the real-world move from high to low-level predictors, errors flow in the opposite direction from low (V1) to higher levels (V5). Noisy sensory data is addressed by a precision or confidence weighting of sensory input by higher-level systems. This means small error units are ignored whereas high-precision errors are more likely to influence the revision of sensory expectations. Perception is thus the process of determining the perceptual hypothesis that best predicts sensory information and, as a result, minimises prediction error.[14] Using the PP model, a variety of perceptual and neuronal phenomena such as object recognition, face recognition and motion perception can be better understood.
Object recognition and illusions
We can detect many of the objects around us with great detail and accuracy, taking into account distances, orientations, and colour. Biederman’s recognition-by-components (RBC) theory[15] provides an interesting insight into this phenomenon but is limited by theoretical constraints that are better addressed by PP. The central tenet of RBC theory is that all objects are composed of 36 possible components or ‘geons’ (geometric ions). This includes blocks, spheres, and arcs for example. The key prediction of the theory is that object recognition is ‘viewpoint-invariant’; this means an object can be universally recognised from all orientations. This was seen in an experiment where a to-be-named object was preceded by a prime.[16] Object-naming was correct even after an angular change of 135°, consistent with the hypothesis.
The issue with this experiment was that familiar objects were used. When the experiment was repeated using novel objects, participants performed poorly with object recognition being viewpoint-dependant and better performance correlated with familiar viewpoints.[17] RBC theory stresses the importance of bottom-up viewpoint-invariance in object recognition. The experiment by Tarr and Bülthoff however, is more in line with PP which utilises a top-down approach of object recognition wherein high unit errors (unfamiliarity) are constantly updated with familiarity to form a next best guess of what an object is.
To demonstrate the above point, this review will employ Adelson’s Checkershadow Illusion[18] (see Figure 1). Despite appearing darker, tile A is the same shade of grey as tile B. (Adelson, 2005). In this scenario, lightness constancy—or the ability to perceive object-relative reflectance by the visual system, despite changes in illumination—is what characterises the illusion (Adelson, 2000). The brain is relying on its prior expectations, which are embedded in the visual cortex. As a result, a cast shadow dims the appearance of a surface, making tile B appear lighter than it really is. The phenomenon of ‘filling-in’ occurs in the V1 and V2 systems with the information at hand.[19] When new information is provided, (See Figure 3B) the brain updates its predictions to the next best guess of what the image is.[20][21] This is one of many practical examples of PP. The theory has been tested empirically to yield some promising results.
In addition to the above example, the Kanizsa illusion[22] (see Figure 2) demonstrated similar outcomes. The illusion or prediction of a triangle is due to the interpretation of the stimulus, based on perceptual priors.[23] Using this experimental design, Kok & de Lange employed functional magnetic resonance imaging (fMRI);[24] an imaging technique that assesses changes in blood oxygen level-dependant activity (BOLD) in targeted areas of the brain.
The results showed an increase in BOLD activity in visual cortexes V1 and V2 which mapped to the illusion of triangles. It is unclear, however, if this activity is due to error signals in the absence of actual triangular contours or if this activity is due to the hierarchical prediction processes itself. These discrepancies were outlined in alternative single-unit recording studies; this is an invasive method that uses an electrode to record electrophysiological activity from a single neuron. Results showed that although V1 and V2 regions are involved in the detection of illusory contours, it is the V2 neurons which respond before the V1 neurons, highlighting a descending hierarchical cascade as proposed by the hypothesis.[25], [26]
Noticeable error signals are also associated with PP. This was seen in virtual reality paradigms in mice where discrepancies between sensory predictions and sensory inputs were relayed as error units from the anterior cingulate cortex (ACC) of the brain to V1.[27], [28]. This in turn signalled predictive information of approaching visual flow[29] or grating stimuli, i.e. a series of parallel, elongated elements with varying luminescence.[30] This occurred based on the mouse’s movements and mismatch between actual and expected visual stimuli (Keller et al., 2012); representative of PP error neurons. Furthermore, experience seemed to modulate this activity whereby predictions error is continuously updated.
Leinweber et al. (2017) demonstrated this point whereby, mice trained in a left-right inverted visual paradigm showed that activity from the ACC to V1 correlated positively with left turns, thus creating a new visuomotor coupling.[31] The above studies provide evidence of distinct neuronal subpopulations which is distinct from traditional feedforward models. The constraints of PP neuronal subpopulations arise from methodological distinction across studies. For example, differences in outcomes between global non-invasive imaging techniques[32], [33] and invasive single-unit recordings[34], [35] mean there is no clear-cut definition of the specific neuronal ‘prediction’ and ‘error’ subpopulations summarised by PP which can give rise to replication failures. Going forward researchers must utilise robust methods to outline the functional properties of distinct cortical subpopulations.
Another central tenet for PP is that perceptual information processing takes place through an inferential hierarchical network accounting for precision weighting.[36] This means the information processed at higher-level cortices will override lower-level present stimuli. Face recognition and motion perception for example involves higher-level visual systems.[37], [38] The fusiform face area (FFA) is specialised for facial recognition and is part of the inferotemporal cortex.[39] To demonstrate this, the macaque monkeys’ face processing system was recorded at three hierarchical levels as typical and atypical facial features were presented.[40]
Researchers found that neural subpopulations at lower-level FFA responded to abnormal facial contours in the late response phase, whereas neurons in higher-level areas of the brain preferred recognisable facial configurations based on previous experience. Motion perception on the other hand is processed through the V5 system of the visual cortex.[41] Apparent motion is an optical motion illusion where stationary objects viewed in quick succession appear to be in motion (e.g., in animations). This occurs due to enhanced feedback connectivity from V5 to the lower-level V1 system.[42] The effect is eliminated upon transcranial magnetic stimulation to V5;[43] a non-invasive technique where a changing magnetic field is used to stimulate an electric current in a specific part of the brain. Similar effects were seen in patients with brain injury;[44], [45] highlighting the predictive influence of V5 and its modulatory effect on V1 activity for motion prediction.
These findings are consistent with the theory of inferential hierarchical processing whereby the brain overrides actual sensory information at a lower level and favours higher-level information based on previous experience. However, evidence is limited at multiple processing levels of the hierarchy (e.g. V3, V4); this means empirical data can be generalised towards traditional models, undermining PP.
Overall, there is sufficient evidence favouring PP, albeit with some methodological constraints. One caveat of the model is that it seems to be maladaptive in individuals that perceive things that aren’t real. These include hallucinations, or perceptions not based on sensory input. This is typical of individuals with schizophrenia.[46] Understanding these abnormalities is important for the development of clinical interventions. Studies utilising psychedelic hallucinogens aim to address this problem by bridging the gap between cognition and conditions such as psychosis.[47]
Psychedelics, REBUS and Hallucinations
The Greek word psychedelic, when translated means mind (psyche) manifesting (deloun).[48] They are a class of psychoactive substances or serotonergic hallucinogens that are agonists of the serotonin 5-hydroxytryptamine 2A (5-HT2A) receptors[49], [50] and produce changes in perception, thoughts, feelings, mood, and cognitive processes[51]. Friston and Carhart-Harris proposed the ‘relaxed beliefs under psychedelics’ (REBUS) model to address these aberrations in perception under the influence of psychedelics.[52]