Without a neocortex language processing in humans is impossible (Kimura 1993; Ojemann 1983, 1991; Penfield and Roberts 1966) and without a hippocampus (but with an intact neocortex and cerebellum) new language associations cannot be consolidated into long-term memory (Corkin 2002). Noam Chomsky (1965), the father of modern linguistics, made two bold claims some 60 years ago. First, he declared that all humans have a universal grammar that is genetically based and that explains why language acquisition is so rapid in young children. Second, he proposed that a central process in language acquisition is a principle called ‘merge’, which takes two syntactic elements ‘a’ and ‘b’ and merges them to form ‘a + b’. For example, ‘the’ and ‘apple’ are combined to yield ‘the apple’. This process can apply to the results of its own output such that ‘ate’ can be combined with ‘the apple’ to yield ‘ate the apple’. Language is thus built-up from component parts using a process called Merge. The basic elements of language (whether auditory or visual) are stored in Wernicke’s and Broca’s areas in a declarative format (Corkin 2002; Penfield 1975; Penfield and Roberts 1966; Scoville and Milner 1957; Squire and Knowlton 2000; Squire et al. 2001) according to the learning history of an individual to create a linguistic map that is unique (Ojemann 1991).
The neocortex of mammals was designed to make associations at the synaptic level, which is well established (Hebb 1949, 1961, 1968; Kandel 2006; also see Pavlov 1927, pp. 328 who found that classical conditioning is rendered ineffective 4.5 years after neocortical removal, but ‘vegetative’ conditioning is intact, Gallistel 2022). Normally, electrical stimulation of M1 (i.e., motor cortex) yields a muscle twitch, but after electrical stimulation of M1 is temporally paired with the electrical stimulation of V1 (i.e., the visual cortex) then electrical stimulation of V1 evokes a muscle twitch on its own (Baer 1905; Doty 1965, 1969). Furthermore, V1 conditioning is dependent on descending pyramidal fibres (Logothetis et al. 2010; Rutledge and Doty 1962; Tehovnik and Slocum 2013), which means subcortical circuits must be involved in the learning process. And we already know which subcortical structures are important here: the hippocampus consolidates the declarative information at the level of the neocortex (Corkin 2002; Penfield 1975; Penfield and Roberts 1966; Scoville and Milner 1957; Squire and Knowlton 2000; Squire et al. 2001; Swain, Thompson et al. 2011) and the cerebellum converts the declarative information into executable code, i.e., to drive the vocal cords for speaking and hand movements for writing (Tehovnik, Hasanbegović, Chen 2024).
Hence, the neocortex, the hippocampus, and the cerebellum together are necessary for humans to acquire language as envisioned by Chomsky (1965). And this capacity evolved from mechanisms already existent in mammals/ vertebrates (i.e., a telencephalon and a cerebellum) and that was passed on to archaic Homo sapiens some five hundred thousand years ago (Kimura 1993), but some believe that the basic elements of language existed in Homo erectus 2.5 million years ago (Everett 2016).
Note: Activation of two microzones composed of Purkinje neurons in the cerebellar flocculus (one for horizontal movement and a second for vertical movement) using optogenetics induces precise movement of the ipsilateral eye of the mouse (from Fig. 5 of Blot, De Zeeuw et al. 2023). This precision is such that each eye has independent innervation for VOR and OKN (the independence allows the eyes to verge across different depth planes). Although we do not have the data for driving the vocal cords, distinct microzones must be activated when we learn to speak a language. This is how declarative information of the neocortex is converted into a motor response (a sound) during learning. No need to invoke abstract concepts to explain Chomsky’s ‘Merge’ since the brain is explainable biologically.