If you are randomly splitting the data then I assume you are talking about a small difference, and this is fine. However, just typing this answer got me thinking about this, and now I'm wondering if the EFA needs more data than the CFA. The EFA is a much more complex model as it has all the cross loadings, and there are more important and tricky questions to answer such as what the optimal number of factors is. So maybe more data should be used in this part of the analysis.
I started by answering your questions and now have generated more confusion that clarity. Maybe some of the clever RG folk can help with this.
Indeed, I found myself confused because I believed that CFA requires a larger sample size than EFA. My rationale was CFA examines a theoretical model with pre-defined factors and might require more data for reliable results.
However, I couldn't find any papers to support this idea. Instead, some papers simply split their data into two halves randomly or use Duplex or Solomon method to split data.
When splitting data between EFA and CFA, it is essential to ensure that both subsamples represent the overall population. The sample sizes do not need to be equal, but both should be adequate to meet the requirements of the respective analyses. Sometimes, data is split unevenly, with a larger portion allocated to CFA to accommodate its stricter requirements for sample size and model testing.
Brown, T. A. (2015). Confirmatory Factor Analysis for Applied Research, Second Edition. This book offers a comprehensive look at confirmatory factor analysis, including discussions on sample size requirements and model validation.
I'd recommend starting with a parallel analysis before one does either EFA or CFA. This is because it's critical to select the correct number of factors to extract and rotate before doing either EFA or CFA. Even if one has a theoretical model in mind at the outset, one needs to recognize that it might be wrong.