Hi everyone,
I’m currently working on creating a simple recreation of GitHub combined with a cursor-like interface for text editing, where the goal is to achieve scalable, deterministic compression of AI-generated content through prompt and parameter management.
The recent MemOS paper by Zhiyu Li et al. introduces an operating system abstraction over parametric, activation, and plaintext memory in LLMs, which closely aligns with the core challenges I’m tackling.
I’m particularly interested in the feasibility of granular manipulation of parametric or activation memory states at inference time to enable efficient regeneration without replaying long prompt chains.
Specifically:
Understanding this could be game changing for scaling deterministic compression in AI workflows.
Any insights, references, or experiences would be greatly appreciated.
Thanks in advance.