(Cross posting from !stupid_questions@lemmings.world https://lemmings.world/post/25535229)

A brand new sentence is a sentence which we consider to have never been spoken or written, or thought of (at least the ones which are not recorded). And just swapping a noun with another noun (for example a name of a place or person) while may techinically be considered new sentence, it is not really matching the spirit of brand new sentence.

for the linguists, can you try to come up with a better estimate (better than just coming up (all the words)^(to the power average sentence length)). Maybe by using the description of using different forms of verbs (like we consider in NLP) (verbs which take DP, CP), then adding standard adjectives and finish with remaining grammar (sorry if I am getting it all wrong, it has been a while since I took my intro to linguistics class). Also, consider a morpheme less form. This exercise is for a more realistic guess.

  • Lvxferre [he/him]@mander.xyzM
    link
    fedilink
    arrow-up
    1
    ·
    1 day ago

    Poisson does make more sense, and it would be easier to work with. In that case the odds of a single sentence having a specific length n would be p = (λ^n)*[e^(-λ)] / n!; for English λ should be around 18 words/sentence.

    English even has a loophole of having ‘;’ which is kinda like full stop, but does not really count as one. (I do not really know how this is classified properly in linguistics, my guess is that it would a conjunction, but then some over powered kind, which allows to break regular grammar rules).

    The semicolon is simply punctuation; a conjunction would be a word, like “and”. Since the semicolon is mostly used to connect related albeit independent sentences, I think it’s fair to treat it like a full stop.

    Please correct me on stuff i got wrong, i am very new to this stuff.

    So am I - my main area of interest is Historical Linguistics, so I’m completely clueless about this stuff. I never thought the statistics classes I got 20y ago in a Chemistry grad would help me with this, but here we are.

    • sgaOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 day ago

      English λ should be around 18 words/sentence

      is not that really huge. Does an average sentence really have 18 words? Would love the source.

      I never thought the statistics classes I got 20y ago in a Chemistry grad would help me with this, but here we are.

      my statistics is coming from QM 1 2 and optics classes

      • Lvxferre [he/him]@mander.xyzM
        link
        fedilink
        arrow-up
        2
        ·
        1 day ago

        I remember reading this number from style manuals, but the sources I’ve found online are actually consistent with this number - this one for example claiming 15~20 words. It seems to vary an awful lot depending on the topic and the author, though; plus the source above is mostly prescriptive, so take it with a grain of salt.

        • sgaOP
          link
          fedilink
          English
          arrow-up
          2
          ·
          20 hours ago

          my guess would have been something like 5-10 words (maybe 7). Maybe in literature it would be much higher, as writing capabilities for people writing literature (technical or not) is much better than average stuff an average person says. Averages have to include less than 10 year olds, and even 5 year olds, which might have hard time having 10 words stringed together in a logical manner. Still seems crazy fact to me.