I think ChatGPT, for example, in Generative AI can be more useful with information if it was based on academic and professional material gathered from books, articles, newspapers, magazines, etc.
Yes, generative AI can be specifically trained to function as a knowledge base, distinct from a generic large language model (LLM) trained on broad internet data.
Knowledge Base-Focused Training
Generative AI models can be developed using curated academic and professional materials—such as books, peer-reviewed articles, industry reports, and scholarly databases—to create a domain-specific knowledge base. This targeted approach increases the accuracy and reliability of generated information by grounding outputs in trusted sources rather than the general internet.
Differences from Standard LLMs
Standard LLMs (like ChatGPT) are trained on large, diverse datasets from the web, which can sometimes introduce errors, outdated facts, or unverified claims. In contrast, a knowledge base-focused generative AI limits its training corpus to vetted academic and professional sources, ensuring responses are more evidence-based and authoritative.
Approaches and Best Practices
Use retrieval-augmented generation (RAG): The model generates answers by fetching relevant content from a structured knowledge base during inference.
Fine-tune LLMs on selected professional corpora to adapt their language generation in line with organizational or research standards.
Implement version control and regular updates of training material to ensure up-to-date knowledge.
Potential and Pitfalls
This method increases trustworthiness in domains like medicine, law, and academia. However, it requires ongoing curation and careful dataset construction to avoid bias and to stay current with new developments.
Summary
Training generative AI as a knowledge base is not only possible but increasingly adopted for applications demanding accuracy, traceability, and subject-matter expertise beyond what general-purpose LLMs provide.