Lotman wrote about texts and documents as a sign each one, at least depending on the length and to carry on the amount that it measures parts or chapters of it, we should be thought about palnning or deciding contents, as always happen with the extremely relatively mind where poetry is dealt with expressing and uttering feelings...
The length of the text can never be a measure of its content richness. Text content richness depends on the skill or gift of using language in a smart way to express content with the least number of words and sentences and the richest of texts are those which are brief and expressive. I am afraid that word counts would not give a reliable measure of content.
No. Prime example: any speech given by Mr Donald John Trump. A spoken text, typically overlong. Zero coherence, hard to comprehend and barely anything that can be called "information content'.
A text is not a piece of a mathematical problem. It depends on certain features, such as coherence and cohesion, you can write a full paper to describe one feeling, sadness let us say. Still, you can cut the way through with a simple sentence. It is much related to a dimensional relation: thought or idea as related to meaning as related to wording and finally how all these aspects performed in a well-constructed structure.
certain, you're right , Dr. Sara, the meaning of a text should show a convinient coherence and cohesion but also relevance and pertinence, saying from Lotman a a text is more than a string of grammatical words under emotions and feelings or reasons, always every text or document is a sign of its time...
DEFINITELY, that is the point, dear Jaume, by the way, unfortunately, I'm not a Dr. yet.
what we are after, is defining a text according to certain features and qualities that sculpt textuality. The length of the text is, on no occasion, one of those features and qualities.
sorry, dear Sara, all is question of times in our minds and skills, the definiton is a thorny problem, well enough previously it can be begun with the length, which isn't either the content or the reference and points to the forms of the confusion mass of schools and traditions as old as short achievements on epistolograhy, epigrams, elegies, panegyrics an so on...
I agree with the replies above. However if you have a specific task you could come up with an appropriate metric to calculate the information content. For instance here in this kind of forum you have got a question and various replies. You could then calculate the similarity of each follow up reply. The more similar, the (that is an assumption) fewer new information are introduced. Probably you can create easily counter example, but this might be one (of many other ones) feature to calculate its information content. Similarities could be calculated with document embeddings/representations or Chi square for instance