I would submit for your consideration to employ the AI industry standard for training foundation models- human feedback. Frontier AI models employ reinforcement learning on large swaths of data but it's the human feedback and the fine tuning of that allowing for precise training of the parameters.
GPT-4 and Claude both allow attaching documents and then analyzing with natural language prompts as well to get past the inherent friction of where even to start. And of course, checking against the literature on arxiv if its more computer science/physics and medrxiv if its more health sciences though there's overlap on both.
It depends on the contribution of the work presented. Additionally, you have to check the application presented, the analysis of the work, and the results.