Try with 10 000 or 100 000

Yeah, the performance of mergesort vs. quicksort is strange. Mergesort is guaranteed to be $$O(n\log n)$$ for all inputs. Quicksort has a worst case of $$O(n^2)$$, which kind of ironically comes up if the input is already sorted (or nearly sorted). But its *average* time over a large set of randomly chosen inputs is $$O(n\log n)$$. (Someone got a CS PhD for proving that.) And for random inputs, Quicksort generally turns out faster, in practice, than mergesort. (Unless what you have is a linked list, rather than an array, in which case mergesort wins.)

But when the asymptotic analysis of algorithms ("asymptotic" ≈ "for large inputs") is a near-tie, often the determining factor is the arrangement of the computer's memory, and of the data in memory. Quicksort, I think, has better locality of reference, and so it has better cache performance in real computers. (But maybe the locality of reference isn't so good when you have an array of pointers to strings, rather than an array whose items are the actual data values; that could answer the question about why strings would be different.)

Going to take a long time: (starting with 1000 items, adding 9000, with occasional saying of dots)

1 dot...2 dots...3 dots...4 dots...5 dots...6 dots...7 dots...8 dots...9 dots...done.

13 seconds with mergesort on 10000 items.

25 seconds with partition sort on 10000 items.

>5 minutes with insertion sort on 10000 items.

No idea, but from context it's a variation of quicksort.

O(n log n) as average and worst-case performance, but what's the *best-case* performance of it?

I think it's still $$n\log n$$, because it has to go through the whole algorithm before it knows that it's looking at an optimal input. The reason Quicksort has a worst-case time that's worse than its average time is that you might be really unlucky and choose a pivot value that's the smallest or largest in the (sub)list on every recursive call, so one partition is empty and the other has length $$n-1$$.