Most authors seem to think so. Frege, one of the few mathematicians who worked on that problem spoke of “unpredictably many” (or uncalculably many; “unabsehbar viele”) - not an infinite number of them. (Frege, Logische Untersuchungen, 3. Teil Gedankengefüge, 1993, p.72 ff (German) /see also Fodor/Lepore „Holism“ 1992,p 242). N.B. „uncalulably many“ is not an NP-Problem in this context.
I could not find a proof that there are infinitely many sentences with e.g. 30 words, no matter in what natural language. The arguments of Chomsky and Pinker are about the understanding of understandable sentences, not about sentences containing, say, 100,000 words.
Here is a reformulation of the question: how can we get infinitely many combinations out of a finite number of elements (e.g. English with approximately 5 mil words) containing up to 30 elements, including ungrammatical combinations? We can’t. In order to get more combinations we would have to extend the chains. But even with sentences of 100,000 words there would not be infinitely many of them.
One simple solution is: “Of course there are infinitely many sentences of the form “This and that is that”, we just have to insert numbers!” But that is not what I mean and it is not what anyone means who is concerned with that problem. Moreover: 1st we would then get an infinite vocabulary – 2nd all these sentences would be comprehensible by the application of one single rule. (Pace Kripke’s Wittgenstein).
How about propositions instead of sentences? E.g. we might utter “This is not the same as that” pointing to different objects while repeating the sentence. The first problem is that we get many propositions but we still have only one sentence. The second problem is that the propositions are not countable. Third problem: there must be a difference that makes a difference. The objects we want to point to must be discernible. And with a countable number of discernible objects we will only get a finite number of utterances.
So even for 7 billion people the pointing to discernible objects will lead to a still finite number of possible utterances.
Some authors compare the number of possible sentences with the number of possible chess games or possible molecules. I don’t know much about these items but it seems to me that for chess and molecules there will be problems comparable to those I mentioned above. Games that get longer and longer, molecules with more and more atoms. Does anyone have another idea?
Martin,
I wrote a big fancy thing, but decided to spare you all.
Natural language study involves several divisions, each with a means of modeling a component of the language capacity. In one sense, it is absurd that an infinite number of sentences would be used in a language or all languages when our science points to an explosion of the sun: Ending human existence limits the number of sentences used in a natural language by definition. However, the decision is not made by attacking the syntax. I agree with mainstream linguistics that the syntax is and should be designed to produce an infinite number of possible grammatical sentences. The problem is that we do not know which subset we will deal with, so we need to model a preparation for all of the possibilities. Likewise, we should not limit the predicate calculus or other means of determining the meaning of those sentences, so there is nothing wrong with generating an infinite set of possible interpretations of sentences. Indeed, modeling natural language requires a multiplicity of mathematical systems. Psycholinguistics will limit embedding clauses within a sentence, but that is not a mathematical limitation, unless psychologists continue the trend to be neurologists and can measure the complexity of the brain, etc., that limits the capacity to process language. I will leave that to them.
In short, the number of sentences that will be used in a language is finite—and that is trivial. However, using mathematical models that account for infinite possibilities, especially in syntax and semantics, is appropriate to the task of modeling the language capacity. It most closely reflects our abilities. Finite uses within an infinity of possibilities.
It seems like the question you are really asking is whether there are infinitely many possible meaningful sentences in a natural language.
I think a fractal is an analogy for the infinite of possible sentences that eludes the concerns you describe above. A fractal can be defined by a relatively simple and perfectly discrete series of symbols. Because the symbols are ordered in such a way that information about the meaning of the symbol is recursively derivable from the meaning of the symbols, which then constitutes new information about the symbols, and so on ad infinitum, a discrete number of symbols can create an infinite series of distinct and meaningful sentences. The sentences are of the type, draw a dot a point x, but if one looks at a fractal it is pretty difficult to say that all you've got there is a visualization of the dull set of sentences of the type you describe by saying, "This and that is that”, we just have to insert numbers!” It does not require an infinite vocabulary and while it does constitute an infinite set of sentences derivable from a single rule, that rule is not simple and it's meaning (solution) changes with each iteration. Finally, without belaboring the point, it's certainly the case that the reason the set of possible sentences is infinite is that language is self-referential. Thus, a recursive algorithm like a fractal works nicely as a metaphor.
Below are some notes on what you've said in articulating your question that I thought might be useful in thinking through the problem. I hope something I've said above or below will be of some use.
There are certainly, as you acknowledge, infinitely many grammatical sentences. One method of proving this would be to assign a Godel number of 1 to some grammatical sentence, this one works fine. Next we show that it is possible to create a grammatical sentence from that sentence by virtue of it being grammatical. Since it is perfectly grammatical to say, "such and such is a grammatical sentence" we can see that for any base case 1 we can always create a second grammatical sentence from it, which we may then give a godel number of 2. So, for any sentence n, it is possible to formulate a grammatical sentence and represent it as n+1. I believe there is nothing illegitimate about using Godel numbers in that way, and thus we have a proof by mathematical induction that there are an infinite number of possible grammatical sentences. I appreciate that this does not answer the question you are asking, but it seems like a useful step in the process.
The other contribution I would offer is that there is to point out that there is something slippery in your saying, "The objects we want to point to must be discernible. And with a countable number of discernible objects we will only get a finite number of utterances." I'm not sure this is true. After all, what is the field of ontology if not, at the absolute minimum, an aggregate of sentences about what does and does not constitute a discernable object. So, the limit of meaningful sentences can not be the number of things that can be said about each discernable object (though, to take your point that there must be a difference that makes a difference, it is not clear to me what practical difference there is between the purely theoretical and supremely vast number which would be the list of all qualities of all objects and the purely theoretical, supremely vast and equally non-quantifable notion that is infinity). To say otherwise would be to commit to saying that a conversation about whether or not x is a discernable object is made up of meaningful sentences if and of if x is in fact a discernable object. But then the sentence, "x is not a discernable object" would anywhere and always be a meaningless sentence. If such sentences are meaningless that would mean that the claim, "x is a discernable object" is non-falsifiable. If "x is a discernable object" is a non-falsifiable sentence which has as it's contrary a necessarily meaningless sentence then it is difficult to see the difference between saying that "x is a discernable object" and "x is a word." Given the choice, I would tend to believe that sentences in search of discernible objects can be meaningful regardless of the conclusion one makes about the object befor I would believe that every word denotes a discernable object. And thus that the number of meaningful sentences is not a function of the number of discernible objects. The above paragraph is obviously not a rigorous proof, though I think the important propositions could be formulated rigorously if one took the time and, more to the point, I think they are sufficiently rigorous to make the point that the apparently reasonable intuition that a correlation exists between discernible objects and meaningful sentences, probably there is none.
I will close as Husserl used to close his letters to Frege, "with the hope that our continued correspondence may contribute something to the cause of science."
-BGL
Dear Benjamin,
You contributed a lot of interesting points. I will take more time to get through them all. But let me ask you something about your last point, while I think your formulation is sufficiently rigorous to make a proof unnecessary. I fully agree that there might be no correlation between discernible objects and a decision whether a sentence is meaningful or not. My point here was not about meaning but about counting and countability. Do you agree that discernibility is then a standard we must uphold even for ourselves in order to know if we just are repeating a proposition or refer to a new object?
Just a glance over the other points:
As to Godel numbers: using them we might get ever longer chains and an infinite vocabulary. Besides it is the question if we are leaving natural language. Contrary to that I would take e.g. “One and two are three” to be normal language.
Self-reference: might take us back to the problems of discernibility.
Fractals: (“meaning of the symbol is recursively derivable from the meaning of the symbols”)
without the chain getting longer?
Is there a procedure to encode
a) ever larger numbers without getting ever longer chains of symbols?
b) self-reference: can iteration of self-reference be documented in the expression (word, sentence, symbol) without adding symbolized information about the number of instances of referring (“layers of reference”)? What about (shades of) colors? Again we would be confronted with the problem of discernibility.
I’m looking forward to receiving good news from your side, take care
Martin (Lexicon of Arguments).
Dear Martin,
Thank you for your thoughtful reply! Your response has gone a long way toward helping me more fully understand your question and it's significance. I've printed out your response so I can reply with the same level care that you have taken in developing your questions and will post as soon as I possible.
Just now it occurs to me that Wittgenstein's suggested philosopher's salute, "take your time" could well serve as the entire contents of an excellent instruction manual bearing the title, "How To Do Philosophy" :)
Martin, would it not seem that all you have to do is count ? The number of meaningful sentences is a (small) subset of the overall number of sentences, just like the number of meaningful words is a narrow subset of the number of possible words.
For languages which assemble words from letters the number of finite-length words (whether meaningful or not) is finite (simple combinatory analysis) and the number of ways to put those words together within a finite-length sentence (or book) is also finite (albeit admittedly very large).
For languages that use ideograms rather than letters, unless the number of ideograms is infinite you'd end up with a likewise finite number of possible combinations.
In all scenarios the number of outcomes is very large but finite. The same applies to music by the way, unless you admit musical notes with a continuous range of vibration modes rather than set Herz numbers per note.
Hi! If you consider all the sentences that people have said/wrote that you get a finite set. Even you gather them 1000 year in the future. Instead, I think we should use the meaning of infinite in somehow similar way as in natural induction. Infinite number of sentences (with or without meaning, decide what you want) does not necessary mean that we want to list all of them, but rather, one can form a sentence (any time, in this way, indendently of the set of all sentences that gatehered) that is not in that set yet (of course in this way the new sentence depend on the set).
Another thing is that the life and the language is developing/changing, new concepts, and thus, new words appear...
The idea that languages have an infinite number of (meaningful) sentences may have some special meaning in this question. Perhaps the special point could be developed. In any case, the usual claim just depends on the possibility of grammatical composition based on (or generated from) meaningful components. One argument works this way: Suppose that A and B are meaningful sentences of L, and that L allows compounding by means of logical connectives. If so, then if A and B are meaningful sentences of L, then so are A & B, A v B, not-A, not B, etc. But if these compounds of meaningful, then so are compounds of compounds, etc. But there is no end to such compounding, and in consequence, L has an infinite number of meaningful sentences --since new examples can always be generated. Whether some of these compounds have the same meaning is a different question--or so it seems.
Hello everybody, thanks for your contributions.
Let’s distinguish sentences (chains of symbols) from propositions (meanings of sentences).
Example: I’m hungry J’ai faim, tengo hambre: three sentences, one proposition
“I’m hungry” uttered or thought by A or by B: one sentence, two propositions.
When you do not understand what somebody is saying you might say in some special case “I understand your sentence but I don’t know which proposition was expressed by it”.
In the scientific literature there is a lot of very funny and most interesting stories about this difference (see David Lewis, Two omniscient Gods, John Perry, Two wanderers who lost their way (independently from each other), see below).
My question was about sentences (Pinker and Chomsky ditto, I suppose) but I played around with propositions too.
Propositions are not countable anyway, as Quine demonstrated (> e.g. Paul and Elmer agree on just three things, Paul believes only one thing that Elmer doesn’t believe. There are other examples in Quine, Word and Object, 1960).
After dreaming of A we cannot say “there is a thing A and I dreamed of it”. We cannot stipulate the existence of that very thing. So there is no way of deciding if two people dreamed of the same thing and no way of counting beliefs.
@ Benedek
I agree, if we allow new words we will get new sentences, of course. But that will lead to an infinite vocabulary. The question how new expressions are introduced into the language is interesting too - by propositions or by sentences alone?
For every sentence can we get a new sentence: that might even work with only two sentences. We simply have to take the context into account.
@ H.G.
I wanted to allow ungrammatical constructions as well to avoid discussions and keep it simple. Astonishing enough there is only a finite number of possible combinations.
Rules: I assumed a finite set of rules. But why not assume new rules? How will they be introduced?
Some rules reduce the number of possible new sentences e.g. negation: for even or uneven numbers of iterations of negation.
But there are different logical systems that allow for differences here, e.g. a difference between false and not true. This will produce more possible new sentences:
It is wrong to say 15 is not many is not true is false.
I strongly doubt that we should say in this case: not-not-not-not-S = S. So the choice of the logical system will decide about the largeness of the corpus of possible sentences too.
Link to John Perry, Two Wanderers
http://philosophy-science-humanities-controversies.com/search.php?suche3=Wanderer
Link to David Lewis, Two omniscient Gods
http://philosophy-science-humanities-controversies.com/search.php?suche3=G%F6tter-Bsp&x=2&y=12
HG, actually I would submit that there is an end to such compounding - easy to prove, it's straightforward set theory. You'd indeed end up with a huge number, but not infinite (say, exponentials of exponentials of exponentials of a finite number)
Furthermore, you would necessarily have a bijective mapping of wherever you would end up to the initial ensemble of possible sentences - any sentence you could possibly build from sub-sentences concatenated through a finite set of operators already extant in the language would necessarily be already contained in the initial ensemble.
Interestingly, your reasoning is similar to the proof used to demonstrate that there are no end to the sequence of aleph numbers - the difference here being that in the case of aleph numbers the seed itself (aleph naught) is already transfinite, whereas in the above case you start from a finite seed.
By the way, the endless sequence of ever higher aleph numbers, starting from a transfinite seed, remains within the "transfinite" domain (not true infinity according to some)
Dear Ransford, Thanks for your further thoughts on the question. However, I don't really see your point. My argument is fully analogous to a proof that, say, the natural numbers are innumerable. No matter what natural number you consider, 1, 2, 3, ... it is always possible to add 1 and arrive at a greater number, so there is no final or highest natural number. Likewise, whatever compound you consider, it is always possible for it to enter into a further compound, so there is no limit to the number of (meaningful) compounds sentences. If A & B is a meaningful compound, then so is not-(A&B), not-not-(A & B), not-not-not-(A & B), ... and so on. If A & B and A v B are meaningful compounds, then so is (A & B) & (A v B), not- [(A & B) & (A v B)], (A & B) & not-[(A & B) & A v B)] ...etc. I think it is easy to see how each series can be continued without limit.
Martin,
Sorry for the delay, though I'm glad to see the conversation has gone in some really interesting directions.
The below does not address all the questions you raised but in attempting to answer some of your questions I believe I have gotten us some way toward setting an upper and lower bound on the number possible sentences and said some things which have implications for the level of precision with which the total number of possible natural language sentences is countable.
As to the question of upholding discernibility as a standard, on reflection I do agree. Though perhaps it will not be as satisfying as one might like, I think that conclusion is demonstrable using a proof by cases. If we grant from the outset that the question itself is meaningful then their would seem to be two possibilities:
Case #1. That some discernibility standard is needed in order to delimit sentences which will be counted from non-sentences.
On this assumption it follows trivially that, well, some discernibility standard is needed in order to delimit sentences which will be counted from non-sentences.
Case #2. That it is not the case that some discernibility standard is required to delimit sentences which will be counted from non-sentences.
By hypothesis this claim is a meaningful sentence as it is nothing but the denial of the proposition under consideration, i.e. if we assume the question is meaningful then it would follow that the answer is also meaningful. But if the answer is meaningful, if it is false that some discernibility standard is needed in order to delimit sentences which will be counted from non-sentences then the proposition that that sentence is the negation of must also be a meaningful sentence. On the assumption that you will grant that at least all meaningful sentences would need to be included in a count of all possible natural language sentences, then it would follow that we will need to count the sentence negated by assumption in case 2. But, if that sentence is to be counted then it is one that is needed in order to delimit sentences which will be counted from non-sentences. So, the proposition implied by the sentence follows from the fact of it being a countable sentence, in this peculiar context.
If we grant that the question is meaningful then, it appears to me to follow that, yes, in order to count all possible sentences we will need to uphold some standard of discernibility because either proposition #1 or proposition #2 is true, on the assumption that we are granting the laws of non-contradiction and the excluded middle. I think it does, anyway? :)
One point that seems to be coming up over and over again in different ways in this thread is whether or not "A and B" is a new sentence, if A and B are sentences. In other words, is it always the case that a sentence that is constituted of the affirmation of two sentences which have already been affirmed is a mere tautology? If yes, then it seems like the number of possible sentences, as you suggest at the outset, is at least countable in the sense that it is a function of the number of atomic sentences which may then be combined (I'll call that number "n" in what follows).
We might say that a combination of atomic sentences ceases to be a possible natural language sentence at the point where the number of operators involved in creating the sentence is such that it is no longer possible that it could function as a sentence in natural language. That is, a natural language sentence must be a sentence that could conceivably be used by a natural language user for one among the various reasons natural language speakers/thinkers speak or think natural language sentences. I think it is not absurd to assume that their is a limit on the number of logical operators beyond which a candidate natural language sentence ceases to be one which is possible (call that number of operators, "m"). I am confortable with the idea that m is an observable psychological fact, i.e. we could measure an outer bound beyond which the addition of atomic propositions will render any sentence so meaningless to a natural language speaker that it is no longer a true candidate natural language sentence. It would be very odd indeed if a sentence was counted in this context if this were the only context in which it would ever be uttered or function like a sentence within a natural language.
The act of counting all possible natural language sentences would create a sort of Platonic bloated universe problem in a way similar to the point about Quine you raise above. Another way of saying this, and I apologize if I am belaboring the point but I want to be convinced myself, is that there is something wrong with defining all possible natural language sentences as all legitimate solutions to a language game that is defined in the terms of artificial language (that is, by rules such as if A is a sentence and B is a sentence then A and B is a sentence).
Another, even more conservative way to define m could be by the number of possible words speakable between now and the end of the universe. we could come up with the lower bound for m by assuming: a. that all words take one second to utter (any time unit works), b. that every combination of words is a sentence and c. that we can always add another logical operator followed by another combination of words. Then m would be the amount of seconds between now and the end of the universe minus every word, every combination of 2 words, every combination of three words, and so on up to n words such that n-m
It seems to me that we will have to distinguish two main strategies. Do we want to discuss linear concatenation – then the question was answered by Chris – or shall we take a look on what may happen with functors, operators, connectives and their impact on the building of possible sentences?
Again Chris’ answer will be the final point of that discussion because all we can get are (linear) chains of symbols. Nevertheless it is fascinating to see how language use extends in more than one dimension. My strategy in this discussion is to take some of these dimensions into account.
@Benjamin
What an extraordinary portion of work you invested within the last days.
Is “A and B” one sentence more? The early logicians shortened “and” by using a dot. This dot is the good old dot between sentences. You may insert an “and” instead of the dot between all sentences in a novel and reduce the number of sentences to one.
A psychologist once told me that there is an empirical limit of seven iterations for “I believe that you believe…”
The question what logically follows from our known thruths touches tacit knowledge and the problem of logical omniscience. We should exclude that we (explicitly) know all consequences that can be drawn from our knowledge.
Tautologies reduce our knowledge but not our speech.
I will have to read your text again. I did not get the point about the lower bound. Why should there be a lower bound at all apart from the fact that there were so many utterances in the past?
Thanks to all!
Games, moleculas and sentences: each of them is built by some smaller entities: steps (movements), atoms, words... In this way they are similar, as we can finish a game, a molecula, a sentence in a certain point, we may also continue it by chosing other movement/atom/word...
There could be various approaches, also, in formal linguistics. One can easily consider only finitely numer of sentences and may say that "everything" can be described by them...
In some cases regular (Chomsky type-3) languages can represent what we want, they allow some repetitive structures, like "My unle's friend's mother's .... "
Of course it can be arbitrarily long in theory (and a theorist may say that it is a correct sentence even with 1000's of words... However in practice, there is a limit (probably the same as was just mentioned by Martin, 7 iterations) that most of the people could catch, but no more...
Actualy, if I remember correctly finite automata were constructed to model the work of the human brain. Finite automata are capable to accept exactly the regular languages. There are arguments that natural languages are not regular...
Also there are various arguments that natural language is not even context-free (Chomsky type-2). Repetitive structures like "John, Mary, Bob, Peter and Sara got A, B, A-, C and B in Maths, Physics, Biology, English and History, respectively" are correct and used in some languages (Switch German is the one that mostly used for such example). But of course if one form a sentence with, say 30 children, noone catch it for the first sight... So it may useless in spoken langugae, however in written way everybody can get its information (it may take more time than just reading it once)....
(and I have not addressed sentences about numbers or logical connectives)
So practically the answer may not be the same as theoretically...(but it can also depend on the preson how long sentence can be uttered/understood by him/her, somehow analogously to that some of the Chinese poets/writers have used much more characters that most people could read...)
Dear Martins, I think you pretty much hit the nail on the head here. A "recursive syntactic system" may not interest everyone, but this strikes me as evidencing a lack of interest in more mathematical conceptions of language. We surely do not use an infinite number of sentences, and I'd link meaning to usage --somewhat indirectly, but firmly. A mathematical conception of language projects meaning beyond usage, and this points to the occasional need to revise meanings in light of new usage. Still, a mathematical conception has the virtue of aiming for broadly encompassing linguistic facts or data. Likewise, there are bound to be a multitude of truth-functional compounds, of perfectly legitimate sentences, which we will never use. But to object to truth-functional logic on this basis misses the broad, encompassing character of truth-functional logic--as concerns the relations of truth-functional compounds.
Hello everybody,
We seem to have finitists and infinitists in our discussion.
@ Pedro Tiago Martins
Recursion: please keep in mind that there is the distinction between sentences (chains of symbols) and propositions (“what was meant”). It seems to me that for the number of repetitions even of propositions being exceeded there must be an endless period of time.
@ Benjamin Galatzer Levy (BGL)
Iteration: I should have been more explicit here, speaking of two alternating sentences producing infinitely many possible propositions. And I made a mistake too: the minimum is three sentences to get started:
#1 (uttered by A) First sentence
#2 (uttered by B) I don’t agree with your last sentence
#3 (uttered by A) I don’t agree with your last sentence…
And so on ad infinitum #2,#3,#2….. …”
Solution: context: - But what a poor concept of context is used here by saying that an even or uneven number of repetitions makes the difference.
@ Pedro Tiago Martins and H.G. Callaway
Rules, games: according to Wittgenstein the rules are the game. But there may be rules that are more peripheral. Those rules that prevent a game from being infinite seem to be more practical and thus not characteristic of the game.
Chess: let’s put the question in another way that is independent from time: how about the number of images of possible constellations? The positions will be identical for repetitive moves. This will reduce the number of possible images (sentences).
There is also a component for the context in games: a position might be attained by different sequences of moves. This corresponds to the distinction between sentences and propositions in a language.
Martin,
I wrote a big fancy thing, but decided to spare you all.
Natural language study involves several divisions, each with a means of modeling a component of the language capacity. In one sense, it is absurd that an infinite number of sentences would be used in a language or all languages when our science points to an explosion of the sun: Ending human existence limits the number of sentences used in a natural language by definition. However, the decision is not made by attacking the syntax. I agree with mainstream linguistics that the syntax is and should be designed to produce an infinite number of possible grammatical sentences. The problem is that we do not know which subset we will deal with, so we need to model a preparation for all of the possibilities. Likewise, we should not limit the predicate calculus or other means of determining the meaning of those sentences, so there is nothing wrong with generating an infinite set of possible interpretations of sentences. Indeed, modeling natural language requires a multiplicity of mathematical systems. Psycholinguistics will limit embedding clauses within a sentence, but that is not a mathematical limitation, unless psychologists continue the trend to be neurologists and can measure the complexity of the brain, etc., that limits the capacity to process language. I will leave that to them.
In short, the number of sentences that will be used in a language is finite—and that is trivial. However, using mathematical models that account for infinite possibilities, especially in syntax and semantics, is appropriate to the task of modeling the language capacity. It most closely reflects our abilities. Finite uses within an infinity of possibilities.
…And a master of none.
My career was steeped in philosophy, blossomed in linguistics, and was propelled by computer programming. Specifically, I programed in a linguistics project through the philosophy department in the very early (mainframe) years of trying to teach computers to use natural languages.
What you have said, CJ, echoes many computationists on a couple of related threads on ResearchGate. You are looking to maneuver characters and words and make limits on character strings with permutations based on those elements. I understand the mathematical inspiration, but the linguist in me, who is most interested in modeling language as a window to human thought, finds all that to fringe on relevance. Indeed, characters (letters) don’t matter at all since language is fundamentally a spoken activity, and characters have only a remote correspondence to sounds, especially in English. Written words are a convenience of sorts. But constraints on sentence length proposed by linguists are not calculated in sounds or words. They are counted in more abstract elements that are at the heart of understanding the language capacity.
The generative tradition uses recursion to open infinite possibilities, but many constraints on length are calculated in number of recursions in specific areas, such as embedding clauses within a sentence. The length of the clauses has much more freedom, and again it is not calculated in characters or words, but is constrained by the abstract elements in the clause, such as the configuration of noun phrases and verb phrases. I hold that any constraint of this sort has some psycholinguistic reality behind it, so it is not a matter of mathematical constraint or syntactic constraint, but a matter of cognitive constraint.
So I side with the mathematicians being comfortable with having the syntax generate infinite possibilities, but look to constraining the number of potentially used sentences with devices that make claims about human processing limits.
Eloquently expressed, Glenn. I was confining my response to the question concerning infinity of sentences, we appear to agree are not infinite, I conclude mathematically, while you hold psycholinguistically or cognitively or due to "human processing limits." Still, beyond any of those human processing limits, a mathematical limit supersedes any other limits, including cognitive.
It seems worth mentioning that no one in this discussion has made a distinction between structural and generative recursion, that I've noticed anyway.
I've had generative recursion in mind throughout this discussion but based on some of the examples and responses I've seen, I suspect that many of you have structural recursion in mind.
For the purposes of farthing this discussion, may I suggest that we explicitly state what kind of recursion the author has in mind going forward? I think it is a very important distinction.
-BGL
Martin,
Some events in my personal life intervened and have taken up all my free energy, so I haven't had it in me to stay involved in RG. Last I was on, if I remember right, I was debating how to gently say that if no one wanted to bother about distinguishing between generative and structural recursion we weren't gonna get very far very fast. I remember you asking a question about a lower bound I proposed (I think my answer would have been that the minimum bound of a natural language would be more or less equivalent to (returning to the beginning) the number of discernible objects described by that language; put another way, ostention seems to me to be something like a minimal requirement for something to qualify as a natural language. Where speaking subjects are pointing at objects, there you've got a natural language.
The point there would just be that there are certainly true statements about the quantity of natural language sentences and from those statements we can conclude that the number is finite. It seemed to me that there was a conflation of countable and finite going on (should it happen, there would be a finite number of human persons who live after our species has lost counting as a technology, but they will not be countable. It's a silly example, but I'm in Dublin and a bit jet-lagged.) I tempted to go a bit poetic (or late Wittgenstein meets Walter Benjamin?) and say that the number of sentences possible in Latin was approximately the number spoken. The limit of the empire, the culture, of Rome and the Pax Romani may well have been Latin. Maybe thats silly, just a thought. Thought it does raise the point that unlike artificial languages, natural language are always some particular natural language and perhaps it only makes sense to ask about the countability of German, French, English or Latin? I dunno, everything after the parenthesis feels fishy.
In any case, I'm moving into computer science now but my background is in philosophy and I'd ultimately like to be be at the intersection of the two, so if your "harsh" post was to correct me, or even to tell me I've missed the point and ought get back up to date on the literature before I presume engage in serious (though not solemn, happily) conversation, I assure you that you'd get nothing buy my gratitude in response.
For me, being a philosophers is the academic analogue of the suitor who only smiles and touches his freshly slapped cheek because his beloved touched him. If I'm wrong, I'd love nothing more than to know it. Don't get me wrong, I like being right more than I like being corrected, but the point remains that unless he's a complete fool who'd rather feel right than be right; a man corrected is, once corrected, just that, correct.
By the way, I had a chance to look at the Lexicon of Arguments project you're involved in. It looks genuinely interesting. I've thought for some time that the major problem with philosophers is that we haven't figured out how to do collaborative work. Much as I appreciate the scientific method, I think we shot ourselves in the foot by letting the Science Wars be decided by the Sokal hoax.
As silly some of the stuff being said was, I can't help fantasying that conversation had settled on the simple, obvious fact that one cause for the triumph of science in the last century must be something to do with the fact labs are places where a bunch of scientists hang out in the same room and dedicate themselves to a common goal. On the other hand, the mark of a truly successful philosopher is that it's almost impossible pop by his office and suggest that your work has anything to do with the really profound stuff he's doing. For all the philosophy of science we've done over the years, it doesn't seem to have occurred to anyone to do philosophy like scientists.
Tschus,
-Ben
Ben,
I think all Martin requested was a rundown of the difference between structural recursion (what I think I presented as the basis for syntax in natural language) and generative recursion (what I imagine I favor for natural language semantics) with some examples.
Glenn
Ben,
One of your side notes inspired another thought on the general topic.
If we take Latin to be the sum of all the sentences spoken in the history of Rome as well as those spoken in the various Roman Catholic Churches for some centuries later, etc., etc., and agree that, for all intents and purposes, Latin is a dead language having no native speakers, then it is fair to say that a compilation of those sentences constitute the Latin language. What is Latin? This finite list of sentences.
But as a linguist, I want to go a step further and ask another question: What constitutes the capacity to speak Latin? Let's suppose for a minute that lasers were invented before the fall of the Roman Empire. There are certain rules about coining new words in languages, and one must be chosen. Then a Latin speaker knows how to pluralize the noun and knows to use it like other nouns in sentences assigning proper case. The speaker knows how to form a verb from the noun. Once a verb form is established, there are ways to present it in different tenses and aspects and moods. These forms are not among the list of Latin sentences anywhere, but the speaker has this capacity to use this new term for a new concept in the language. So the ability to introduce new terms for items presents new cases that are not empirically listed. This increases the list of possible sentences of Latin to some much larger number, or if one believes that the number of concepts is limitless, then to infinity.
Furthermore, what is more amazing is that a child can deal with Latin (or any other language the child is exposed to) after having the input of only a minuscule fraction of the list of sentences. From that the child can generate some of the already used sentences as well as generate or understand a multitude of new sentences, each needing to be added to the list when created. And no one explains what an adjective or verb particle is to the child.
Since we have no idea what subset of the total of the sentences on the list will be used by a particular speaker and no way (???) of determining how many new items or concepts or feelings may be in the future for speakers of those languages, I find it appropriate to model that capacity for a language (as well as the capacity for language) with a mathematical device that generates an infinite set of possibilities. If it is a good model, it will include that big list of all the competently used sentences. If it is a good model, it will eliminate the sentences that native speakers deem ungrammatical. It will also allow for infinite possibilities beyond the previously "used" sentences.
Glenn
Ben,
This removed contribution of mine was in another thread. It had nothing to do with our discussion here. I mentioned it because you are following me and I guessed that you might have read it. The content that I esteemed to be “too harsh” in retrospect was an invented example about researchers quoting each other. Be sure that it had nothing to do with anyone here or in another thread. It was an invented little story and I found it funny by the time I was writing it. Later it didn’t appear to me that funny anymore. That was all.
The upshot of this little story was the question, if an author who quotes not all available sources should say that he "showed" a solution or should only say that he claimes or proposes a solution.
Glenn,
Excellent example and a good explanation of what a mathematical model may be used for.
By the way there is this logical problem with never uttered sentences. I just mention it, you know it of course, it is about the difficulty of attributing properties to non-existent objects and the truth values of these attributions. It seems to be avoided when you write down these sentences and say “this sentence was never uttered”, but then you are uttering it in a way.
How could a mathematical solution look for e.g. “there are more than n never uttered sentences that might be uttered in the future”?
It seems to me that the more you fix their possible forms, the more you might be getting into performing them. The less you fix their form, the less you‘II be able to count them.
Martin,
The response to your question needs about a chapter in a book to begin to arrive at any conclusions, but maybe we can peck away at it.
First, let’s consider the case hinted at by Ben of just considering all the sentences ever spoken (or written—it’s not crucial to distinguish here) in a language as being that language. We dealt with this in theory, but the problem is that no one is in the position to hear them all and record them. In the time it took for me to write this paragraph, I would imagine that several hundred million English sentences were spoken, and no one kept a list. So we have already proposed a way to define a language that is pragmatically impossible to accomplish. Although it was not explicit in my discussion, the idea required a large logical assumption. IF we could count all the sentences ever spoken in a language from its inception until its demise and listed them all, then that would constitute the language. We worked a little to follow that assumption to useful ends.
In reality, then, we can only round up a sample of a given natural language; we cannot delineate the entire language. That’s why we need a model.
Is this any different for any recursive mathematical system? We can’t list all the possible outcomes, but we can list a series of rules that will generate them all. [Yes, I have assumed that language requires a recursive system, but I think we have conceded that in this discussion. We have also conceded that we are not concerned with a string that is infinitely long, but only with a list of strings that is infinite.]
Next step: Mathematicians can assist me here: how would we write a totally unrestricted infinite set? Maybe with a meta-rule @ => @@, such that @ means any symbol desired. I guess we need to specify that the starter symbol—any one you want—may be in the set of outputs so there is no restriction on length of output strings. One can formulate a set of rules with any symbols desired using this meta-rule. Every other system (even though we are only interested in the recursive ones) would be a subset of this set. I will venture the idea that even the structure generated will be infinitely ambiguous.
Step three: Now we need to begin to restrict the set of outcomes. On the most obvious level, we don’t want English to turn out to be the same as German or Kinyarwanda, but on a more fundamental basis, we don’t want natural languages to turn out the same as all “other” (mathematical, programming, etc.) languages; otherwise, we cannot demonstrate any claims about what makes something a natural language using our model. Doing any of that is not step three, but just the motivation for it. So how do we restrict an infinite set? Symbols and structures. So a more restricted set from the one above would be one generated from a => aa. In this case, the number of symbols is restricted to one, a loss of freedom from the previous system. Let’s loosen up a little more: a => ab. We have more symbols, but we have also set a pattern for the structure of the tree. It will continue to branch to the left, but not be able to branch to the right. Without the trees, it is a bit hard to visualize, but if you work through a couple of iterations, it will be apparent. We can continue with these types of restrictions to a system.
More to come.
Glenn,
*I started writing this post this morning, before your most recent post. I think your last post may make some of what I say below redundant as you've beaten me to the punch on a couple issues. Please forgive any redundancy.
Apologies for any confusion I caused. Martin suggested I post some of what I said in an email to him on this thread since it was pertinent and in haste I failed to properly edit my response so that it fully made sense in the current context. Still, I'm glad of whatever confusion inspired your post, even if my fumbling was the origin. :) What you say at the end of your post gets to the crux of one among the issues I take to be central to this conversation. You write,
"I find it appropriate to model that capacity for a language (as well as the capacity
for language) with a mathematical device that generates an infinite set of
possibilities. If it is a good model, it will include that big list of all the
competently used sentences. If it is a good model, it will eliminate the
sentences that native speakers deem ungrammatical. It will also allow for
infinite possibilities beyond the previously "used" sentences."
I think it's worth while to try to imagine how we might build such a device and to take a look at some problems that crop up as we do. I say, "we" genuinely, by which I mean I'll give it a go and if it seems like it might be fruitful to improve or clarify my initial rendering let us by all means do so.
My previous proposal could, I suppose, be expressed as a device but a very cumbersome and limited one. It's axioms would be a complete list of every latin sentence ever uttered and it's rule for checking whether or not a sentence S is a possible latin sentence would be: is S one of the axioms listed?
I take you to be saying that that device is no good because while it succeeds in catching all the good latin sentences thus far used it doesn't do us any good if we came across, for example, an illegible manuscript and wanted to discern whether or not it was written in Latin. Suppose our imaginary manuscript reads, "Caes?? non est supra grammaticos!" The machine I've got could conclude that there is a sentence, "Caesar non est supra grammaticos!," which is good Latin by axiom n. (i.e. that axiom which is the once rendered sentence, "Caesar non est supra grammaticos!"). The sentiment is clearly meant to imply that no-one est supra grammaticos, and not that Caesar specifically stands in that particular relation to the grammarians. That is, in both a semantic and syntactic sense the first word, "Caes??," properly refers to any particular person at all. So, according to my device, that sentence is a proper latin sentence both only if that first word is "Caesar" (for the sake of argument I don't see anything wrong with assuming no similar sentence had ever been written about someone else whose name/title starts with the letters 'c,' 'a'....) and only if that first word need not be "Caesar," as that is the meaning of the sentence. So, I'm stuck with a machine that either produces a paradox or shrugs and says that the fragment is Latin Undecidable, i.e. it must remain agnostic where there is no axiom n that validates that sentence. I can appreciate how this device would be unsatisfying.)
I think the device you're describing is one that is, in the terms of set theory, both complete and consistent. That is to say, it is a theoretical machine that would generate all and only possible Latin sentences. Presumably such a machine would include a set of axioms in the form of some list describing the syntax and the semantics of the language in addition to some rules regarding irregular verbs and other such exceptions to the general syntax, some rules for adding new words, and some rules for eliminating words (after all, I speak English, not the history of English). This is as far as I think we can get without running into trouble. Consider some of the decisions we'll have to make:
1. What language will we use to describe the device? Latin is one obvious choice but that would send us spinning into an infinite regression almost immediately. After all, there is no word in latin for such a device, and the moment we made up a word for it that word's meaning would change (to include that word). We'd have what some Plato scholars have called, "a bloated universe problem." To be complete we'd have to posit endless new vocabularies for describing out describing machines. Another way to think about this same problem is to consider sentences of the type, "such-and-such is not a Latin sentence." In order for out device to be complete we would need an axiom that states that any non-Latin sentence must be expressible in a good Latin sentence by adding the words, "is not a Latin sentence." To keep the baby we'd need to save not only the bath-water, but also the river is was drawn from and any combination of phonemes that do not describe the river it was drawn from. So, we couldn't describe the device in Latin, because the consequence of doing so would be incompleteness.
2. Suppose then that we do not describe the device in Latin. We will still need to describe it is some language. I think it is a pretty safe bet that that language can not be any other natural language as the essential characteristic of natural languages which made the notion of artificial languages necessary is that they are messy and unstable in just the way one would need to avoid if one is after a rigorous description.
3. So, we're left with no choice but to describe out machine using an artificial language. I gather that this is what you had in mind in the first place? But is such a device possible? Consider again the formulation, "such-and-such is not a Latin sentence." This time, however, instead of considering it from the stand-point of a latin speaker, consider it as a construction of our artificial language. By assumption, the artificial language used by our device must be able to express any proposition of the type, "This sentence is not expressible in Latin" or "Any sentence containing this expression is not expressible in Latin." If it could not express those propositions it would not be complete, as there would ipso facto, be sentences which our language is unable to designate as either possible or not possible Latin sentences. But here we are again forced into contradiction! It would, after all, be a true fact of the Latin language that the sentence can not be expressed in Latin but one which had been arrived at by developing a device that exhaustively describes the Latin language. We would, thus, be in the position of having to say that there are propositions about the Latin language which can be derived from other true propositions of the Latin language which can not be expressed in the Latin language; but wouldn't that violate the consistency axiom? Put another way, wouldn't this fact stand in precise contradiction to the assumption of the possibility of such a device existing at all?
Ben,
Your response is on the mark for a forerunner to my previous post and additions yet to be formulated. So now read my last post that picks up from here and begins to try to make some sense of the whole thing.
You have brought up several problems with mathematical systems that are not rich enough to handle natural language as well as some other associated problems, so we will see if in the end we can defeat them.
In the meantime, however, just as we can have a meta-mathematics, I see no problem with having a meta-language to discuss the problem. For our exercise, we have noted (or stipulated) that Latin is now a dead language with no native speakers, so there is no reason for anyone to speak it. Let's, for the sake of clarity, leave the set of Latin sentences alone for now, not allowing any more sentences to be part of it. We'll use English as our meta-language to discuss Latin...aren't we? ;-)
The system you propose is the Microsoft spellcheck system. There is a list of properly spelled words, and if a word that is typed is on the list, it is passed, and if it is not on the list, it does not pass. Note that the word "spellcheck" is not on the list, so it is marked ungrammatical. My name turns to "Gingham" if you follow MS, and there are other problems. If I understand you right, you want to expand on that and not only list words, but list sentences, so if the sentence is on the list, then it is Latin and if it is not on the list, it is not. (The spellchecker on my online class software does not recognize the word "online.")
The problem is that one can misspell a word, and if the word is on the list for some other reason then the misspelling is missed. "There frogs are bigger then mine" has 2 mistakes in it, but the look-it-up-on-the-list method does not detect that. Likewise, one may intend to say in Latin that the sun is out, but may use the wrong words and end up with something absurd, such as "I want to eat grass." If the sentence is on the list, however, it will pass, but the usage of the sentence is wrong. My conclusion is that this simplified system is inadequate to sort the good sentences from the bad. I have yet to propose what to do about it, but I am trying to walk through the steps one by one, laying a strong foundation, so if anyone has questions about the first cautious steps, let's tackle them now.
Glenn, Ben
I feel a bit uncomfortable that you take the effort to write such long posts. But they are most enlightening and helpful for me and I think they will be very interesting for others too. More than once both of you reacted on questions before I could ask them. That happened in the case of the distinction meta language – object language, for recursive rules and Latin as a language that is not expanding anymore. So I hope you will go on.
As to the problem of truth values when ascribing properties to non-existent objects: this problem is obsolete when we consider models like the one drafted by you, Glenn.
A very interesting point is that in an ever increasing set of possible sentences a language could turn out to be the same as another language, which is to be prevented by restrictions of the possible symbols.
Ben,
A sentence describing a sentence of another language may quote this sentence. Will it incorporate it? I think it will contain the quote as something that is equal to noise first. E.g. this English sentence
“It would be an ungrammatical sentence in Latin to say [Caes?? non est supra grammaticos!].”
Will the distinction ever fall? Maybe it will if the quotation of Latin prose becomes a daily exercise?
Bloated universe problem: if we don’t need translation procedures but only grammatical rules for the object language described in the meta language then we perhaps might avoid the problem? I am not sure about this. Maybe we avoid it if we do not allow extensions of the object language? (Latin considered as a dead language).
Martin,
You have anticipated my next move. Although I may not be able to fully articulate it yet, I will toss it out there now. In respect to the system, the idea is to generate all the possible sentences of a language, in this case Latin. It comes closer to my concept of modeling the capacity to do Latin rather than just describing Latin. In that sense, it has explanatory power rather than just descriptive power. It allows us to suggest why certain groups of sentences are OK and why others are not.
As far as truth values, then, all sentences are potential sentences in the model. The best we can hope for is a sampling of the actual sentences in the language, but only actual utterances will require truth values--or some of them will. ("Hello" does not need a truth value.)
It is akin to having thirty five chairs in a room and successive groups of 25 people coming in to sit. It does not matter which 25 chairs are occupied because they are all potential seats. It may be that some never get used or some get used by every group; it doesn't matter. In that regard, then, we want to generate enough grammatical sentences to handle a language, and to do that we need to generate an infinite number of possibilities from a finite set of objects. We don't know which will be used, and we know trivially (because it is an infinite set) that some will never get used, but we still need to have those as "available" prospective sentences.
It is also important to note that at this juncture, I am only claiming to model the syntax of language: the linear order plus the structure of how the words are related, that is, the tree structure that can be mapped from the phrase structure rules. I am not (yet) claiming that the system will model phonetics, phonology, morphology (the inner structure of words), semantics, pragmatics, diachronic considerations (a fancy word for history), sociolinguistics, and whatever sub-fields one may want to delineate. Indeed, there will be some division among scholars about how to handle some of these, especially semantics.
Here is something you folks might enjoy. It talks about the number of sentences of a language used, prospective, etc.
http://languagelog.ldc.upenn.edu/nll/?p=11619#more-11619
Glenn
That’s almost incredible! It’s amazing that it is true even for trigrams . An interesting point is what the author writes about mathematical methods of the 20th century being inappropriate for these problems. I would like to know something about more recent methods (I wonder if I'll get the point, not being a mathematician).
Geoffrey Pullum, the author, is not a mathematician, but a linguist, so I am sure you will get it. On the other hand, it takes some advanced math to model linguistics, such as the processes we have been discussing.
Of course, this article came along just in time for my next point: we CAN use the target language of study as the meta-language of study. We can take samples of that language since even if it were possible to record all the sentences of a language as we proposed in our thought experiment in this thread, our list will fall far short of indicating the possible grammatical sentences of the language (the point of the article linked to). Therefore, it is necessary to abstract away from the list of actual known sentences and propose grammatical rules--that is, use a mathematical model--to delineate prospective sentences. One added source of data comes with native speaker intuitions. Knowing that everyone's built-in grammar has slight differences from others, it is a bit tricky, so "clear" examples that many people agree on work well, and even sometimes finding sentences that are "iffy" or "awkward" for some speakers might shed light on the internalized rules of grammar and how the rules (rather than the sentences) relate.
The followers of this thread may find the following short article interesting:
P.M. Cohn, From Hermite rings to Syvester domains, Proceedings of the American Mathematical Society 128 (7), 1999, 1899-1904.
http://www.ams.org/journals/proc/2000-128-07/S0002-9939-99-05189-8/S0002-9939-99-05189-8.pdf
See the Abstract and the remark on page 1901:
…The class of Hermite rings, which is clearly definable by elementary sentences, cannot be defined by a finite set of sentences.
In effect, infinitely many sentences are needed to define Hermite rings.
James,
Thanks for joining the discussion. I would like to say that the article lost me somewhere around weak finiteness, but my mind is actually still bouncing around the abstract. ;-)
If you could hint to me how Hermite rings, skew fields, and the like would help us understand natural languages, I might be inclined to extend myself a bit to understand the math. I am not sure of the relation between an infinite set of mathematical "sentences" defining a ring and the infinite set of potential sentences in a model of a natural language. What am I missing?
James,
Thank you for the interesting link. I did not get quite the relation to linguistics. Maybe I will have to go deeper into it.
Glenn,
I apologize about not getting back to you sooner than this.
A mathematical sentence is usually understood to be a proposition that is either true or false. The article that I found observes that an infinite number of sentences are needed to define Hermite rings. I thought perhaps since there is a need for an infinite number of sentences in this instance, we can then consider whether there is a need for an infinite number of natural language sentences in defining some natural language construct.
For example, from Joyce's Finnegan's Wake (p. 6), we have
http://www.chartrain.org/PDF/Finnegans.pdf
"The oaks of lad now they lie in peat yet elms leap where askes lay."
Substiture "answers" for "elms" and substitute "questions" for "askes" (plural for "aske"). Then we can, perhaps, define "answers recursively," if we agree that there is an unbounded number of "askes" (questions) giving rise to an unbounded number of "elms" (answers).
Try
An answer An arises from an "aske" Qn,
an "aske" A(n-1) arises from a question Q(n-1) and
an "aske" A(n-2) arises from a question Q(n-2) ,
and so on ad infinitum, if let n equal the cardinality of natural numbers.
James,
Well, we slipped from syntax into semantics (I think), but that is OK. So are you proposing that for every question there is one distinct answer? That is what I read from the symbols (although the translation from Joyce seems to be different).
It seems to me, given a question, there is an array of answers, some true and some false. In that way, each single question can give rise to a potentially infinite number of answers. If questions are infinite as well, then we have an infinity of infinities. (This more closely corresponds to the natural language problems we've discussed.)
Just for fun, let's squeeze the metaphors out of Joyce and be a bit more literal and exact. What do you believe?
1. Every questions has exactly one answer.
2. Every question has a set of answers.
3. Every answer has a set of questions.
4. Questions, in general, have answers.
5. Something else.
And which of these (one or more) underpins the statement from Joyce? Put another way, which of these statements is a reasonable attempt at a semantics for the cited sentence?
gjb
Glenn,
From the recursion, I think what we have is two sequences:
Q1, Q2, Q3, …., Qn as n tends toward infinity
| ….. | …. |, ….., |
v …. v….v , …., v
A1, A2, A3, …, An as n tends toward infinity
James,
I think you answered my question symbolically that there is a one-to-one correspondence between questions and answers. The number of both--I love this phrase--tend toward infinity.
I am not certain if you are stipulating that the answers be right or "true" answers to questions, but it seems that is what you have in mind. So consider this: "Who is that guy in that picture?"
1. Dane.
2. My only brother.
3. A glass container model maker.
4. The husband of my sister-in-law.
:
n-19. My seventh cousin's seventh cousin.
:
n. tending toward infinity.
These are all true answers to the question, but the correspondence is not one-to-one, and they are different kinds of answers: a name, a definite description, a member of a class, and an analytic answer based on #2 and additional information about marriage.
Does this fit your scheme?
Hi!
About infinite and finite, we may also have a mid-version "practically infinite", which may come from engeenering point of view... even in the chess there is a finite number of possible games (actually, it is done by artificial restrictions by having rules of the chess that say it is draw if you have the same state three times, or there is 40 movements without any irreversible step, i.e., movement of a pawn, or taking something out of the field), this finite number is still very huge (more than the atoms in the universe), so practically there is no way to list all of them (however theoretically we may have an algorithm that can produce every possible game in a row one after the other)... it is a recursive set (actually every finite set can be counted as a recursive set)... but we have no time to wait for the listing... (and of course there is not enough space in the world to list all possible chess games...)
so, practically infinite means that from computational point of view this set can be seen as an infinite set, even it is finite...
so, I just wanted to say, that even we may create some artificial constraint on recursion of sentences of the natural language, let us say, for instance, it is not allowed to have a sentence with recursion that depth is more than 30, since noone can understand it..., we still have a very huge number of sentences, that can be seen practically as an "infinite set". (Practically, if this set would be finite, then there won't be any problem with machine translation etc. we could have a database, from which any sentence can be translated to the other language... and that is the engeenering point of view)
Another thing: just reflecting to some of the previous notes: I am not sure, if in natural languages we can restrict ourself to statements that are (strictly) true or flase. This is/was one of the main ideas of classical logic, but we have several arguments that the "world is not digital, not only TRUE/FALSE", there are several reasons to have fuzzy or other values in some cases. I mention here just two exampes: "This house is big" (it depends on the other house we compare it), and the other is the LIAR paradox "This statement is false".
Glenn,
Picking up on your suggestion that a question may have more than one answer, then we can modify the question-answer growth model to obtain
Q11, ………………, Q21, ……………Q31, …………..…, Qn1 as n tends toward infinity
| ….. | …. |, …..….., |
v …. v….v , ………., v
A11, A12, A13, … , A21,A22,A23, … A31, A32, A33,…, An1 as n tends toward infinity
From this, we can assert that infinitely many natural language sentences are needed to define the question-answer growth model.
@Benedek Nagy: ...I am not sure, if in natural languages we can restrict ourself to statements that are (strictly) true or false.
Yes, I agree. The question-answer growth model from my previous post makes no assumption about the truth or falsity of the answers. And, yes, this is an instance where fuzzy set theory would be useful.
An answer to a question may be partly true. The degree of truth of an answer is in the interval [0, 1]. And the number of degrees of truth in the interval [0,1] is infinite.
So we have the number of possible answers to the first question Q11 is in the interval
[0, n], where n tends toward infinity. And the degree of truth of A11 (first answer to the first question) has a degree of truth among infinitely many possible degrees of truth.
@Benedict
Your notion of "practically infinite" is akin to what has been called in this thread "potentially infinite." What seems to lie behind this concept is that although it is not practical to list all the cases (or some significant subset of them) as you say, we need to model the activity with a mathematical model that has infinite capacity and then delimit the cases with constraints. I suggested before that two main constraints would be number of symbols and limits on phrase structure rules that only allow certain structures (left branching, right branching, binary branching, etc). There are other types of constraints.
It dawns on me that a particular constraint may or may not end the infinitude from the set of projected outputs of the system. The time limit rule in chess would be a case of eliminating infinity in the overall system. Allowing castling to take place only once per side per game does not eliminate infinitude for the overall system, although it constrains the possible outcomes.
@Benedek
Pardon my Anglicization of your name. It happened to me in the mid-1700's. My perfectly good German surname was transformed from (probably) Bingorman to Bingaman to the very British Bingham. Apologies. (James, is the previous claim true? Can something happen to me 2 centuries before I was born?)
Glenn,
Many thanks for your posts about practically infinite and about names. I have a story similar to yours concerning my name. More later.
Glenn,
I was in a hurry when I wrote the previous post (my wife was calling me to do some grocery shopping before the evening traffic started).
My family name (on my father's side) was Pitr (or Pietr) up until the mid-1700s, when my ancestors emigrated to Canada. After that, Pietr was anglicised to Peters.
Is a language sufficient to implement a Turing Machine? If so, then perhaps you could could apply Turing's system of analysis to decide the matter,
Ok Christopher, that sounds good.
Why don’t you delineate a language and the way the Turing machine could work with it?
This is important because when working with a computer we must know something first: We must formulate our problem. Then we can know whether we have achieved our goal.
I think in all the famous examples with TMs never showed up a real machine. The work is on our side when it comes to devise and to describe the process.
I may not be able to answer this to the satisfaction of you mathematicians, but I will venture the following.
Chomsky disposed of strict phrase structure grammars, such as generable by Turing Machines, as early as 1957 with his article "Syntactic Structures" (available at http://www.postgradolinguistica.ucv.cl/dev/documentos/49,578,Noam%20Chomsky%20-%20Syntactic%20Structure.pdf). If you care to revue Chapter 5 on page 34 (about 40% of the way down), you will see that Chomsky argued not that phrase structure grammars had no possibility of generating the sentences of the language, but in doing so, they would create such a mess that there would be no explanatory power left in the derivations, hence, the birth of transformational phrase structure grammars.
Because of the multi-disciplinary nature of the audience for this thread, there might be a lack of understanding for goals in some camps. In a mathematical sense, just making the machine churn out all the sentences in a natural language is a formidable goal, but for linguists, the further target of explaining the language capacity in a principled way places further restrictions on the model.
Martin,
I didn't mean that a TM should operate upon language in some way to decide the matter. What I really meant was that if a language was sufficient to implement a TM then perhaps a language could be treated AS a TM. - Perhaps I misunderstood you.
Hi!
that is an interesting question to connect natural languages and Turing machines. I mean that infinite can be (at least) two types: enumerable (like natural numbers), or continom (like real numbers)... so if we agree that the number of (possible) sentences is infinite, then the next question is what type of infinity we have...
since the work of our brain is somehow misterious (even the finite state machines/finite automata was originally designed for modelling our brain in 1950's)
if we believe that a language can be formalised, then by this formalism we may go to the direction of formal systems and Turing machines and we have at most countable infinite many sentences... but as I remember I met some arguments that we have more, i.e., continum many sentences... (but I do not remember exactly where...)
another interesting question about the language and the thinking, which is based on the other? one may feel that everything that we could think is in a (natural) language... but sometimes we may have some "feeling" that is hard to describe by words or by sentences... or can you describe a paint (e.g. Mona Lisa by sentences) ?
I am not sure if we can limit our thinking to a countable set... and then we may have more sentences as well...
p.s. (it happens many times, that we have versions of names and it is very easy to transfer them to our own language and use that variation...)
Hi Benedek,
It is probably a silly question, but - are orders of infinity continuous or discontinuous? If discontinuous then what of the 'gaps' between different orders of infinity? Have THEY been quantified?
Yes, a natural language has infinitely many possible sentences. This can be confrimed by Chormski Grammar. The generating of these grammrs can generate infinite sentences without Turing halt. So this question is solved.
There is no need to mathematize language in order to argue for an infinite number of sentences in a natural language, for mathematics is just a branch of LOGOS (albeit, a very important and beautiful branch!). In a perfect world, thought and language would be analogous to a photon that is both a wave and a particle; LOGOS (p.s. I strongly oppose the Copenhagen theory, but that's for another discussion)! Nevertheless, enslavement of the body and brain/mind has made it so that we are also enslaved to a limited language. In other words, our thoughts are limited and thus our language is limited, or vice-versa. At times, we may get glimpses of freedom, in which case we realize that our language does not correspond to the level of our 'higher' thoughts. Whenever our language is at a 'higher' level than our thoughts, however, it is a mere indication of one is even more enslaved to 'the Matrix' of this world. (Perhaps you can now understand why I oppose the Copenhagen interpretation of quantum mechanics...two in one does not mean A=A).
To put it briefly, under one logical picture, our natural language does not have an infinite number of possible sentences because our thoughts are limited, and we are enslaved to a finite world. Under another logical picture, our natural language has the potential (thanks to the magic of syntax) to have an infinite number of possible sentences. This infinite potentiality of our language would be fulfilled only with the simultaneously occurring freedom from mental/physical slavery. In other words, at a bare minimum, we'd have to be with Alice in Wonderland.... :) 'Infinite' is difficult to define. When I say infinite, I have numbers in mind...and I have my own intuitive definition of it I suppose, that I cannot explain in words. As for orders of infinity in language....that deserves much investigation and unfortunately, I'm not sure that our limited human minds would even be able to truly understand it. For all we know, there could be a limited number of natural numbers but an infinite number of real numbers. Likewise, in 'Wonderland', we could have a limited number of words/syntax, but an infinite number of thoughts, thus combinations....thus 'logical pictures'. Who knows...fun topic though!
Hi Christopher!
that is a good question if there is anything between countable infinite and continouos infinite... actually it cannot be decided based on the usual axioms of set theory. therefore one may use the "continoum hypotesis" or so.... one may believe that there is nothning between them. others may think an infinite hierarchy of infinities, created by e.g., the possible subsets of a set with cardinality of the previous inifnity...
on the other side, as others are already noted in this discussion, for us (people) it is not so easy to work with infinity... and there could be some paradoxical situations when infinity appears...
Dear all,
Thank you for your last posts. Sorry that I did not comment on your interesting ideas for such a long time.
There were some questions about possible orderings of sentences in a natural language and considerations of different forms of infinity. The original question was, whether sentences as chains of symbols have to become longer and longer when we want to build more and more sentences. This question was answered by Chris Ransford (Feb 25, 2014). I conclude: the chains will get longer and longer. So for a natural language there are natural limitations for the understandability, let alone because of the sheer length of the sentences.
In the meantime we have seen proposals to take into account not only sentences but also thoughts (propositions). I am happy that we discuss these aspects, too. Anyway, as I pointed out in the original question, propositions are not countable. Imagine this dialog: “I am thinking of this, not of that” –“No, I bet you are thinking of that, not of this.” – Absurd to hope that we could have a solution for problems like this. What cannot be identified cannot be counted.
Infinity: There were questions about orders of infinity - the Continuum Hypothesis “knocked on the door”. - The form of infinity depends of the shape of the considered sets. Thus the question of order (of infinity) brings us to the question of an ordering for the considered objects. How can we build a (mathematical) set out of natural sentences?
Thinking about infinity is fascinating but we should not forget that for different forms of infinity there should be an application that needs this form of infinity – and not another form.
Ordering of sentences in a natural language: As long as we do not have an ordering for natural sentences, it seems to me that we have no real reason to speculate about different orders of infinity. As for chains of symbols it is very simple (see above).
Sentences: could be counted when we display them all together (write them down). But this “set” of sentences is not well ordered. That means: there is no first element. Which sentence should be called “The first sentence” and by which procedure should we go over to “The second sentence”? Let’s write them down and arbitrarily attach numbers to them. But by this we are applying the ordering of the natural numbers and not an ordering that is a characteristic feature of a natural language.
I mean, we will not get an ordering that tells us anything about a feature of the language that will lead to consequences about how to go on.
When counting sentences (and any other kind of objects) we apply natural numbers, no rational nor real numbers nor any other kind of numbers. The reason for this is that we are not dealing with fragments of sentences.
When we say that there is "no second real number”, this is a quite different problem that has nothing to do with our natural language.
Yes, the number of proposition of ordinary Lenguage semper is finite, but the posibility of reduplication in other mind, in other metalenguage or in other time, is incountable, potencial infinite, but allway actual finite. The mind is allway is potencial infinite, but actual finite. The infinite actual only appared in God,
Infinite vocabulary:
Assumed there are new words created every day. Will this lead to an infinite set of possible sentences? Be n the number of words in a sentence; then we have
For any n: there are finitely many possible sentences with n words.
The vocabulary is not infinite in any natural language. But even if it were, it would not lead to infinitely many sentences because of the above "formula". There are so many possible combinations with 2000 elements and so many for 2001, and so on for 2002, 2003…
That is, we know something about the future, namely that there will not be infinitely many sentences, unless something is changed in the definition of a sentence.
Dear Ramon,
What is density in relation to the number of possible sentences? Aren’t sentences like sheep to count? Is it sensible to say that there is a possible sentence in a logical space, located between coordinates that are indicated by real numbers?
It is obvious there infinitely many possible sentences in a natural language(NL). A simple proof is that NL is Chormsky 0-type grammar, say:
A->A+B,..... like
noun phrase ->noun phrase+B,
noun phrase ->noun phrase+B+B,
noun phrase ->noun phrase+B+B+B,......
So there are infinite circles in the expression! In NL too.
Dear Yinsheng,
I cannot see that Chomsky proved that there are infinitely many possible sentences. It is not at all clear that a generative grammar (or other grammars) can produce an infinite number of them. This topic is discussed in Pullum and Scholz (2010) pp 7 -9.
http://www.lel.ed.ac.uk/~gpullum/bcscholz/Infinitude.pdf
Of course you can always add an expression. This makes the chains become longer and longer. That is not contested.
I believe that the sentences of a natural language can be put into 1-1 correspondence with their Godel numbers. Then there are a countably infinite (or aleph-zero) number of sentences.
I am assuming that there is no upper bound on the length of the sentences in question.
Dear Steven,
That is an interesting idea, I hadn’t thought about it before. Up until now I have three thoughts on the subject:
A.
As in the case of Godel numbers for mathematical formulas we could display the structure of sentences. Let’s first take Godel numbers for negation, conditional etc. or for particles like “if”, “nevertheless” etc. Then we might nicely depict the linguistic structure.
B.
Godel numbers do not really depict or display the structure but they encode it. They make it become “anonymous”. Anyway there would be enough GN to count possible forms, of course. What is fascinating, they first could count grammatical structures and then the “contents” (structures plus inserted words when we count the words, too).
C.
Natural numbers are still sufficient. There is no place for possible sentences in an assumed interval between sentence n and sentence n+1.
It seems that it is not neccesary to code the sentences to Godel Numbers.
If the number of sentences finite, then, Godel Numbers coded by the finite sentence finte; othervise Godel Numbers infinite,just as the number of sentences
So Godel Number seems to not be helpful for determining finity or infinity of the number of sectences in a NL.
Godel numbers:
There is a problem when we want to count sentences by GN: Logical connectives (“if…then…” etc.) might be doubly counted, as a part of the structure and as a part of the vocabulary.
Dear Yinsheng,
Infinity/finity: I think the discussion often turns back to the question of actual or possible objects. And this happens also in the literature on this item: there are often misunderstandings because both sides are right in a sense: there are only finitely many actual objects and infinitely many (non-physical) possible objects. At any point in time there are finitely many actual objects. The same is true for past objects that disappeared in the meantime: there were and - summarized - there are finitely many.
Martin,
We agree on the finity/infinity situation. As far as double-counting, I think that is a strength, not a concern. Implication is part of the semantics, and it must be recognized at some point, that is, counted as a semantic structural relation. If speaking English, then the "if... then..." statement is a convenient way to represent that semantic content (but not the only way) with syntactic structure and lexical items, so it counts as that. When speaking another language, the implication remains the same in the semantics, but the syntax and lexical items will differ.
As according to grammar rules there is no limit on a sentence length (see also http://en.wikipedia.org/wiki/Longest_English_sentence), there are infinitely many possible sequences.
However, sentences containg more than few dozen words seem to be incomprehensible (c.f. http://www.onlinegrammar.com.au/how-many-words-are-too-many-in-a-sentence/). People in communication use much shorter sentences, maybe in literature it is possible to find extremely long examples (e.g. Ulisses by Joyce).
If we limit consideration to sentences of ceartain length, say 50 words and assume that the dictionary contains n words, then the number of sentences will be upper bounded by n + n^2 +...n^50. For English n is claimed to be about 170000, so the estimated number is quite large.
A number of words in a simple clause is much smaller. With the limited vocabulary, the number of clauses (or phrases) is not only finite, but it is possible to store them and provide automatic translations (e.g. Google Translate and statistical translation).
@Piotr
Your example gives the mathematician's view, but the number of sentences is restricted much more than that from a linguist's view. Without favoring one theoretical base over another, just consider the following string: Yellow long old Italian floppy leather sleepy. It has an n less than 50, and the symbols were drawn from among the 170,000, but it is not a sentence (well formed formula). Structure delimits the number of sentences. So by preserving finity by sentence length, you have constrained the set, but by allowing only certain combinations of words based on their categories and the ways that categories combine, the number of grammatical strings is reduced drastically.
@Glenn
Undoubtedly you are right. With a formal grammar model, the number of well formed sentences will be much smaller. However, to state that something is finite or not, it suffices to give even a rough estimation -- in ths case based on variations of words.
A more intriguing problem is whether it is possible to produce infinitely long sentence that still has semantics and carries information.
Some examples:
A sentence with temporal relations ... Tom waked up, THEN Tom dressed up, THEN .... THEN [on the next day] Tom waked up, THEN Tom dressed up ... can be developed infinitely long. Repetitions of clauses are allowed, as they describe different activities.
For sentences describing what is happening (or happens) repetitions of clauses with the same meaning (descrioption of state or activity) should not be allowed. In this case such sentence may be finite, as we will run out of noun candidates for subjects (Tom is waking up, Jerry is waking up, ...). Howerver, if numbers are allowed, we may produce infinitely many subjects: "Tom with ID car #1", "Tom with ID card#2", etc.
The number of possible sentences with 50 words is much larger than 170.000. We have to take into account not only 50 possible words but a vocabulary of approx.. 5 Million words for English. - 50-word-sentences are types of sentences. When we insert other words from our vocabulary, we’ll get a lot more sentences of that type.
A sentence cannot be infinitely long because it needs a dot at the end. Otherwise it is ungrammatical and it is not a discrete object that can be counted.
Piotr,
I have read your post again and I saw that you did not claim the number of 50-words-sentences to be 170.000 - but n = 170.000 for n + n^2…n^50. Excuse me,
Martin.
Martin,
In fact, I found information that in 2009 Google by analyzing 15 000 000 books identified over n=1000000 words in English. Amazing!
http://www.theguardian.com/books/2009/jun/10/english-million-word-milestone
Any well formed sentence is finite. However, for a sentence finished with a dot, we may apply a production rule extending it. An example is:
. -> and CLAUSE.
which adds the CLAUSE at the end. Hence, theoretically, there is no upper bound on a compound sentence length.
Best regards
Piotr
Piotr,
Exactly! We could replace all the dots in a novel and get a text that consists of only one sentence. As I wrote earlier in this discussion with Ben Galatzer Levy and H.G. Callaway: this was reflected in traditional logical notation where the symbol for “and” was a dot. Smart!
The other way around we can get more sentences by putting more dots. But what is then the shortest sentence? Is it always possible to have one-word-sentences like the last sentence in the last section?
Best,
-Martin.
Martin,
In English (I think) one-word sentences can be used to express orders or exclamations. In Polish (and probably other slavic languages) an indicative sentence can be reduced to one verb, In this case, its conjugation form implies the subject.
I agree with Glenn, that it is not the syntax, what really matters, but the underlying semantic model. Returning to part of your question about a language with 30 words...
1) Assuming these 30 words include AND, several nouns and verbs (but not conjunctions indicating temporal relations) a sentence of any length can be produced.
2) If the semantic model for the sentence is a snapshot in time (a world), then the number of such models will be:
a) finite, if repeated utterances are not counted, e.g. 10*(Tom reads books) is modeled as one statement
b) infinite, if utterances are counted
c) may be inconsistent: "Tom reads books and Tom does not read books"
3) If the alphabet (30 words) contains temporal conjuncions (e.g: after, then,next), then the semantic model would be a sequence of worlds or a tree of worlds (the latter, if OR is used). The sequence can be infinite, however the worlds may repeat (they will be discernible by the number in the sequence).
To conclude, in my opinion, infinitely many sentences (or a sentence of arbitrary length) may describe a finite number of worlds, if temporal relations are excluded. To compare with the chess game: there is a finite number of situations on the chessboard.
If temporal relations are allowed, the semantic model can be infinite (like observation of two Kings dancing around at the end of a chess game).
Please observe, that IF ... THEN ... construct, that was mentioned by Glenn, is not a temporal relation, but rather an axiom or belief.
Best regards
Piotr
It is easy to describe how one could make very long sentence:
"a" is a string and "aa" is a string" and "aaa" is a string and "aaaa" is a string..."
The dots indicate that there is no reason why one should restrict to any specific length of the sentense. In this sense there is no upper bound on the length of the sentence. Still the sentence would terminate at some stage if it stated by mortal human.
Another issue is that it is not clear what the question means for a natural language. If I say
""one" is a number and "two" is a number"
I think the meaning is the same as if I fi st say
""one" is a number"
and then say
""two" is a number".
Whether a concatenation of sentences should be counted as one or more sentences is just a convention. The conventions about how we count sentences are different in different languages.
Best regards,
Peter
Short sentences:
I don’t know what we should do about the peculiarities of languages. What I had in mind was, that possible answers can consist of one word. “What color is this?” – “Red.” That’s fine. Anyway the sentence “Red.” cannot stand alone.
Are there sentences that cannot be an answer to any question?
Piotr,
You talked about words and also about worlds. I am not sure that I got it quite right. In philosophy there are worlds sometimes understood as sets of sentences. This could be an interesting new dimension for our discussion. Often propositions are used instead of sentences when worlds are discussed.
Wrong claims and inconsistent utterances are “perfect” sentences for our “purposes” here.
Peter,
Wouldn’t it be better to make strings like “abc..”. instead of “aaa…”? – - I think it is agreed that the specific length of the sentence is always specific when we are talking of a specific sentence. No sentence without a dot.
The question whether there are infinitely many objects because you can always go on pointing to another object is an old one. It came up several times in this thread. Once we got n sentences we can always have another sentence. Astonishingly this works even with two sentences. Different from that is the question whether we can build n+1 sentences after having n sentences - but this was not the original question. Of course we can. The question was, whether the chains of symbols will need to become longer and longer. Up until now no contributor came up with another solution. The chains will become longer. I would love to hear your ideas!
Peter,
You spoke about the possibility of splitting sentences: instead of saying “a and b are numbers” we could say “a is a number” and “b is a number”. Ok. I think the probability to repeat known sentences will grow when we make them become shorter. The other side of the coin will be: again we will have to form longer sentences!
We should not make the strings as a, ab, abc, ... because we will run out of characters after just 26 strings.
Hi Peter,
I must confess I didn’t really understand “aaa…” and I thought it would be better to take different symbols. For me there could be more than 26 – we should not take these symbols for the words themselves. But this is not so important, I guess.
I make the symbols NP, VP, PP, etc. constraining the mathematical set to items that not only fit the mathematical model, but have explanatory significance about human language.
[Note: Noun Phrase, Verb Phrase, Prepositional/Postpositional Phrase, etc.]
@Martin
1) Worlds
The term "world" I would refer rather to the semantic model behind a set statements in a language.
For example, in the Kriepke model a world is an assignment of truth values to propositions and several worlds linked by temporal relations may exist.
If a graph model for a set of statemants is selected, eg. RDF, a world can be perceived as a cluster in the graph determined according to selected feature:
2) Short, one word sentences
They are allowed in natrural languages because they are assigned with an implicit meaning related to the previous context:
Example 1
- What is the color of this book?
- Red (means this book is red)
Example 2
- What is the color of this car?
- Red (means this car is red)
Typically, if yoy hear someone murmuring out of the blue "red", you would refer to their internal mental process giving the context, that is not known to the observer.
@Piotr
I agree about the meaning of one-word "sentences" being drawn from context, but would go one step further and say that the syntax, as well, is drawn from context, so that in one level of analysis, the sentence is actually a sentence and not a single word.
"Who did Mary kiss?" comes from a transformation that relates it to the declarative version of "Mary kissed Wh-someone" where the "Wh-word" (a technical term for linguists) is in the string where information is needed. So if the answer is the one-word "John," the underlying syntax to the answer is "Mary kissed John." The meaning (semantics) will then associate with that sentence as if it were all said.
@Glenn
It seems that correctenss of sentences with wh-words can be analyzed taking into account possible rephrasings.
("Who did Mary kiss?","John.") ->(("Who did Mary kiss?","Mary kissed John." )
However, there is no unambigous rephrasing for:
("Who kissed whom?", "Mary") -> ???
On the other hand:
("Who went where","Mary") -> ("Where did Mary go?")
@Piotr
Your correct observation about English is noted. A sentence with more than one Wh-word cannot have a one-word response, except in maybe the craziest circumstance: "I didn't hear that; who shot whom in the foot?" "John" in a case where John shot himself. (This takes about 4 times as long to process as a straight-forward answer would, and it certainly is not standard English.) The two words "John, Bill" doesn't seem to work at all.
Of course, Mary would only fit the "who" in the last example, but it doesn't suffice to answer the question. The final answer would be in the same general syntactic form as the question: "Who went where?" "Mary went to Berlin."
The length of a sentence in a natural language is bounded by the human possibility of reading or listening the sentence. By the way, this possibility is also the origin of the object called "sentence", and it is why all languages on the Earth are built in sentences. The set of possible words being finite, it follows that the number of sentences is finite. This contrasts with mathematical logic, where all objects called theories (introduced as deduction closure of a set of axioms, or as set of formal sentences true in some model) are infinite. It is also an important component of the general truth that all natural sciences can make only an approximative and partial description of the world.
@ Mihai
You will find if you read through all the messages that modern linguists use the mathematical model of recursion to generate infinite possible sentences from a finite number of symbols. This applies mainly to the sub-field of syntax, and other neural or pragmatic constraints, as you mention, may restrict this number.
@ Glenn Bingham
Glenn, you are right. I mentioned objects called "theories" in the Mathematical Logic, and I did not mention other objects, like "formal languages" in Computer Science. Formal languages are produced by recurrent definitions as Chomski Grammars, or are recognized by devices like automata, automata with restricted memory or Turing machines. The notion of "formal language" comes originally from linguistic studies and the objects are again infinite. This infinity has a good motivation: it is not clear which is the bound for the length of a sentence, so the sentence seems to be potentially infinite. Friedrich Dürenmatt has a novel called
Der Auftrag oder Vom Beobachten des Beobachters der Beobachter. Novelle in 24 Sätzen. Diogenes, Zürich 1986
This novel consists of 24 very long sentences - in fact every chapter is a sentence which ends with a point at the end of the chapters.
In a certain sense we can enlarge the meaning of the concept "sentence" to the dimension of a chapter, of a book, to the containt of a hard disk or even to the dimension of the national library. However, the finity will remain an essential property of the number of possible sentences, with all other consequences from my first post. At least, this is what I think now.
Let us try to formulate a question that will give an upper bound of the number in question. We start with N (N is finite) number of words. These words will be referred to as primary words. 2 or more number of primary words may permute themselves to form secondary words. The maximum number of primary words that can be included in a secondary word is equal to M, where M is a finite number. Primary and secondary words together form the set of words W. A sentence is a sequence of words taken from W. A particular sentence may be permuted to get a new sentence. A sentence is allowed to contain r number of words where r is between 2 and P and P is a finite number. Let S be the set of all possible sentences. I think it is possible to prove that the cardinality of S is finite. Can we prove that the cardinality of S will always be greater than the number of sentences in a Natural Language containing N number of primary words?
Dear Anup,
Thank you for your answer. I see two aspects in it: for the first aspect we may look at Chris Ransford’s answer (Feb 25). The second aspect is about the vocabulary and the question whether the combination of words in order to shape new words will reduce or enlarge the number of possible sentences. I find this quite interesting.
While English tends to have new (shorter) words for some objects, other languages use combinations. E.g. in English a book containing phone numbers is called a directory – in German it is a “Telefonbuch” which is a quite logical combination out of telephone and book. This leads to constructions like [Donaudampfschifffahrtsgesellschaftskapitän], a [captain on a steamship on the Danube belonging to a certain steamship company]. One word or 13 words.
Here comes the question: longer and shorter elements of sentences – will they lead to more or less possible sentences? Here is the last part of my post from 7/16/14 about short sentences as elements of longer sentences:
“Peter,
You spoke about the possibility of splitting sentences: instead of saying “a and b are numbers” we could say “a is a number” and “b is a number”. Ok. I think the probability to repeat known sentences will grow when we make them become shorter. The other side of the coin will be: again we will have to form longer sentences!”
Ok then, here is the variant of this question for words: will we have more possible sentences with one long word like the above mentioned [Donaudampfschifffahrtsgesellschaftskapitän] – or will we have more possible sentences when we are dealing with the 13 words of [captain on a steamship on the Danube belonging to a certain steamship company]. These can appear in more different permutations, but the resulting sentences may be not more. - Why? Because the resulting sentences are “less specific”. Sentences like “The ship is on the Danube” are already in the corpus. Do you think this argument holds?
There is another problem for the vocabulary. Let’s leave that for the next post.
@Anup
I agree with your observations and calculations. As the one whose focus is modeling and explaining the human capacity for language, I have at several junctures in this thread agreed that there must be some limit on sentence length and the number of sentences in a natural language for pragmatic reasons. On the other hand, I feel that modeling the syntax of natural language--that which is most closely akin to programming and pure mathematics--is best illustrated with allowing an infinite number of possible sentences by allowing recursion in the combining rules of the model.
The key to finding the set of sentences finite, as you have so clearly laid out, lies in your stipulation that P is a finite number. I find it a better explanation to attribute the finitude of P to practical matters (the sun burning out someday) or neuro-science (limits on cognitive processing), rather than in the mathematics of syntax. Therefore, I would not restrict P to be finite in modeling the syntax of natural language, but would refer students to matters of practicality or understanding brain function to arrive at that finitude.
So to mathematicians, this will likely seem to be trifling, but to linguists, it seems necessary to break the science down into components and separate their respective effects on the whole. Each component can be set up under a different mathematical model.
@Martin
After a compound is formed, I think it adds to the possible sentences in a language. If telefonbuch is coined, previously one might say (you can do the German better than I) "a book for listing telephone numbers" or a "number listing book." Take all the sentences that these noun phrases can be used in as subjects, objects, objects of prepositions and then substitute the new compound in all those places.
Sometimes compounds take on a life of their own, such that the following is not a contradiction: My blackboard is green. It fits where the longer version might not: *My black piece of slate is green (although one might argue that the equivalent would be more like "My traditionally black piece of slate is green"). In any event, we can add the compound in where other descriptions have served and possibly use them more places.
If you want to have some fun, take the German words "kinder" and "garten" and compound them into kindergarten (children's garden), and it becomes an English word for the first year of formal school (or now sometimes second year of school, the first sometimes called "pre-school," which makes no sense at all).
@Glenn Bingham,
That P is finite may be justified by the fact that a sentence should always be terminated by a period "." to make it grammatically correct. Martin has already indicated this in one of his post.
Dear Martin,
Finiteness assumptions for N, M and P may be justified for a natural language. However, if secondary words are also allowed to take part in the formation of new words, I am afraid whether the number of words would remain finite even if N and M are assumed to be finite.