This is a fundamental problem beyond conventional focuses of linguistics on the syntax and semantics at the sentence level or below it. The gaps need to be filled before cognitive computers and computational linguistic systems may practically process documents and web pages in natural languages.
Rhetorical Structure Theory is a state-of-the-art approach to this issue, even if it has several shortcomings. In particular, RST is week in identifying linguistic objects that establish coherence and therefore are responsible for the compositional semantics of text. Episodic Logic is another interesting, much more formal approach. See here: https://urresearch.rochester.edu/institutionalPublicationPublicView.action?institutionalItemId=20115
You might also consider Discourse Representation Theory (DRT), a formal semantic model of the processing of text in context which has applications in discourse understanding. DRT was originally formulated in (Kamp, 1981) and further developed in (Kamp & Reyle, 1993), with a concise technical summary in (van Eijck & Kamp, 1997). DRT grew out of Montague's model-theoretic semantics (Thomason, 1974) which represents the meanings of utterances as logical forms and supports the calculation of the truth conditions of an utterance. DRT addresses a number of difficulties in text understanding (e.g. anaphora resolution) which act at the level of the discourse.