I recently decided to think of an algorithm based on natural selection of DNA mutations. Being unclear about its speed complexity, I decided to write it up and test it out. The result was counter intuitive.

First, a simpler variant: Imagine the following sorting algorithm, given an array of numbers: Randomly shuffle the numbers until they are sorted.

What is the median amount of times you end up shuffling the array before it is sorted?

The answer is n! where n is length of the array. This is known as Bogosort.

Now, a slightly more advanced version:

Assume you can tell which numbers are already in their correctly sorted position, and which aren’t. Only randomly shuffle the numbers which are not yet sorted. Keep all the others in place.

What is the median amount of times you end up doing this kind of shuffling before the whole array is sorted?

I’ll be revealing the answer tomorrow.

Edit: The answer is that you end up shuffling only n times.

Addressing some concerns: this is not a real sorting algorithm, because it assumes you already know whether some of the records are sorted or not.

It shows me how “random chance” in DNA mutations can actually occur much faster than we expect. When a better gene allows an organism to survive better, it sticks around, while all of the other useless genes randomly shuffle around until they can become more useful too. This way organisms with a long DNA code can still evolve rather quickly even if it’s by random chance.

  • AbelianGrape@beehaw.org
    link
    fedilink
    English
    arrow-up
    7
    ·
    edit-2
    2 years ago

    I’m not sure the median is what you want. The worst case behavior is unbounded. There is no guarantee that such an algorithm ever actually terminates, and in fact (with very low probability) it may not! But that means there is no well-defined median; we can’t enumerate the space.

    So let’s instead ask about the average, which is meaningful, as the increasingly high iteration-count datapoints are also decreasingly likely, in a way that we can compute without trying to enumerate all possible sequences of shuffles.

    Consider the problem like this: at every iteration, remove the elements that are in the correct positions and continue sorting a shorter list. As long as we keep getting shuffles where nothing is in the correct position, we can go forever. Such shuffles are called derangements, and the probability of getting one is 1/e. That is, the number of derangements of n items is the nearest integer to n!/e, so the probability of a derangement would be 1/n! * [n!/e]. This number converges to 1/e incredibly quickly as n grows - unsurprisingly, the number of correct digits is on the order of the factorial of n.

    We’re now interested in partial derangements D_{n,k}; the number of permutations of n elements which have k fixed points. D_{n,0} is the number of derangements, as established that is [n!/e]. Suppose k isn’t 0. Then we can pick k points to be correctly sorted, and multiply by the number of derangements of the others, for a total of nCk * [(n-k)!/e]. Note that [1/e] is 0, indeed, it’s not possible for exactly one element to be out of place.

    So what’s the probability of a particular partial derangement? Well now we’re asking for D_{n,k}/n!. That would be nCk/n! * [(n-k)!/e]. Let’s drop the nearest integer bit and call it an approximation, then (nCk * (n-k)!)/(n! * e) = 1/(k!*e). Look familiar? That’s a Poisson distribution with λ = 1!

    But if we have a Poisson distribution with λ = 1, then that means that on average we expect one new sorted element per shuffle, and hence we expect to take n shuffles. I’ll admit, I was not expecting that when I started working this out. I wrote a quick program to average some trials as a sanity check and it seems to hold.

    • AbelianGrape@beehaw.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 years ago

      Only the number of shuffles is linear. Shuffling an array and marking/deleting correctly-placed elements still take linear time even with a “placement oracle.” It’s at best O(n^2) so the algorithm still wouldn’t be a good sorting algorithm.

      It’s like doing selection sort, except we’re selecting a random set of elements (from that poisson distribution) instead of the smallest one.

  • cgtjsiwy@programming.dev
    link
    fedilink
    English
    arrow-up
    3
    ·
    2 years ago

    What is the median amount of times you end up shuffling the array before it is sorted?

    The answer is n! where n is length of the array.

    I assume you meant mean instead of median. The median of a geometric distribution with parameter p is ceil(-1/log2(1-p)).

  • Arete@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    2 years ago

    This is interesting. Naively I’d expect that each number in a shuffled array of length n has a 1/n chance of ending up in the correct position, so there ought to be on average 1 correctly placed number regardless of the array length. I might be neglecting a correlation here, where each incorrectly placed number decreases the odds for the remaining numbers. Assuming the above though, the whole problem becomes recursive, since we’d be left with an unsorted array of length n-1 on average. The expected number of sorts would then just be n. For the time complexity, we’d have O(n) for the original shuffle, plus O(n-1) for the next, and so on, yielding O(n^2) overall.