One could argue that a much more empirical set of data, based on concrete and directly observable overt behavior patterns, detectable with eye-tracking technology, at key times, yet in "real time" (i.e. in then-current behavior patterns), could be used, AND HYPOTHESES DIRECTLY TESTED, as explanations for concept development. Start at the following Question:
https://www.researchgate.net/post/A_Beginning_of_a_Human_Ethogram_seeing_the_inception_of_cognitive-developmental_stages_as_involving_a_couple_of_phases_of_non-conscious_perception
The "sensori-motor" explanations have turned out as not well-founded and based on VERY indirect evidence, at best, and seen IN PEER REVIEW, as having "no future":
Article The poverty of embodied cognition