(Cross posting from !stupid_questions@lemmings.world https://lemmings.world/post/25535229)
A brand new sentence is a sentence which we consider to have never been spoken or written, or thought of (at least the ones which are not recorded). And just swapping a noun with another noun (for example a name of a place or person) while may techinically be considered new sentence, it is not really matching the spirit of brand new sentence.
for the linguists, can you try to come up with a better estimate (better than just coming up (all the words)^(to the power average sentence length)). Maybe by using the description of using different forms of verbs (like we consider in NLP) (verbs which take DP, CP), then adding standard adjectives and finish with remaining grammar (sorry if I am getting it all wrong, it has been a while since I took my intro to linguistics class). Also, consider a morpheme less form. This exercise is for a more realistic guess.
Poisson does make more sense, and it would be easier to work with. In that case the odds of a single sentence having a specific length n would be
p = (λ^n)*[e^(-λ)] / n!
; for English λ should be around 18 words/sentence.The semicolon is simply punctuation; a conjunction would be a word, like “and”. Since the semicolon is mostly used to connect related albeit independent sentences, I think it’s fair to treat it like a full stop.
So am I - my main area of interest is Historical Linguistics, so I’m completely clueless about this stuff. I never thought the statistics classes I got 20y ago in a Chemistry grad would help me with this, but here we are.
is not that really huge. Does an average sentence really have 18 words? Would love the source.
my statistics is coming from QM 1 2 and optics classes
I remember reading this number from style manuals, but the sources I’ve found online are actually consistent with this number - this one for example claiming 15~20 words. It seems to vary an awful lot depending on the topic and the author, though; plus the source above is mostly prescriptive, so take it with a grain of salt.
my guess would have been something like 5-10 words (maybe 7). Maybe in literature it would be much higher, as writing capabilities for people writing literature (technical or not) is much better than average stuff an average person says. Averages have to include less than 10 year olds, and even 5 year olds, which might have hard time having 10 words stringed together in a logical manner. Still seems crazy fact to me.