memnus: Dragon with pigtails and glasses, saying "No sense... in a way that blows your mind?" (That makes no sense! (O&M))
Brian ([personal profile] memnus) wrote2007-10-09 07:16 pm
Entry tags:

For the probabilitists in the audience

I'd like your thoughts on a problem. (Here, all "randomly chosen" numbers are taken from the same continuous real distribution.) Suppose you have a set S of n randomly-chosen numbers, and a set D, initially empty. Begin by removing the largest number from S and placing it in D, and adding another random number to S. Now take the largest number from S that is smaller than the last number you chose, place it in D, and replace it with a random number. Repeat until you've taken the smallest number from S.

- What is the distribution of numbers in D?
- What is the expected value for the size of D?
- Bonus question (I know the answer to this one!): Why is Brian thinking about this problem?

I'm about to start in on a Monte Carlo estimator of the problem, but I'd like some theoretical input.

click

[identity profile] archonsengine.livejournal.com 2007-10-10 02:27 am (UTC)(link)
Why is Brian thinking about this problem?

Oooh, I know! It's 'cuz he's a nerd, right? =o]


More seriously, my gut tells me that the expected value for the size of D is probably infinite. I was never all that good at this sort of math, though, and I'm also ridiculously out of practice. I'm still going with "infinite", though.

almost...

[identity profile] tortoise.livejournal.com 2007-10-10 04:46 am (UTC)(link)
Unless I'm confused about probability (which is very possible), I have most of a combinatorial solution to this problem. The short answer is: don't bother, because it's bloody awful (except maybe for the purpose of proving Dan wrong with some slightly-less-awful estimation, but I'm not even quite to the point where that's possible yet).

The long answer follows.

First, the thing I might be confused about: you can simulate choosing $n$ numbers uniformly from an interval $[0,\alpha]$ by first choosing the largest of them (say $y$) with a pdf proportional to $x^{n-1}$, and then picking the rest of them uniformly from the interval $[0,y]$. If this is false, my entire argument is hosed, and it's the kind of thing I get confused about easily, but I'm pretty sure it's true.

Now, if the number you're adding to $S$ is ever larger than something you removed, you can totally ignore it, thereby reducing the problem to something like its instance with $n-1$ numbers. (I say "something like" because you're more likely to generate something that gets thrown away, since you're picking from a larger interval. This is a pain to fix, so I'll just worry about how long it takes to get rid of the first guy.)

This leads to the following way of rephrasing the problem:

1. Generate the largest number $y_0$ of $n$ things in $[0,1]$. Put it in $D$.
2. With probability $1-y_0$, be done (i.e., reduced to $n-1$ things). Otherwise, generate the largest number $y_1$ of $n$ things in $[0,y_0]$.
3. With probability $1-y_1$, be done. Otherwise, generate the largest number $y_2$ of $n$ things in $[0,y_1]$.

So we just need to evaluate the expected value $E_k$ of $1-y_k$ for each $k$. This is given by
$$
E_k=\frac{\int_{y_0=0}^1 \int_{y_1=0}^{y_0} \dots \int_{y_k=0}^{y_{k-1}} (1-y_k) y_k^{n-1}y_{k-1}^{n-1} \dots y_1^{n-1}y_0^{n-1} \, dy_k \, dy_{k-1} \dots dy_1 \, dy_0}{\int_{y_0=0}^1 \int_{y_1=0}^{y_0} \dots \int_{y_k=0}^{y_{k-1}} y_k^{n-1}y_{k-1}^{n-1} \dots y_1^{n-1}y_0^{n-1} \, dy_k \, dy_{k-1} \dots dy_1 \, dy_0}
$$
which ends up giving $1-\prod_{i=1}^{k} \frac{in}{in+1}$. So, if we've reached the $k^{\rm th}$ step, we'll continue to the $(k+1)^{\rm st}$ step with probability $\prod_{i=1}^k \frac{in}{in+1}$; you can use this to find the expected time it takes to eliminate a number from consideration.

In particular, the odds of continuing are always smaller than $\frac{n}{n+1}$, so the expected number of steps before we lose a number is definitely less than $n+1$.

(Unfortunately, once we lose a number this analysis doesn't quite apply. Intuitively, though, it should not-apply by some constant factor, and so in particular the number of steps required to finish should be finite. Making this rigorous is left as an exercise to the reader.)

[identity profile] camlost.livejournal.com 2007-10-10 04:50 am (UTC)(link)

I don't believe Dan. Suppose we set n=1, then it's pretty clear that we expect to collect the first element, and there is a 50% (I'm guessing) chance of replacing it with something smaller, and we can continue this argument ad infinitum and we expect to collect on average 2 elements.

So, the probability of adding an additional useful element is x (assuming we populate S with elements between 0 and 1). Unfortunately, I don't know how fast we go through x, so I can't really do the integral. I think you expect to through x at a rate of around x * (S-1)/S. This should allow for a diffeq to be solved or somesuch, but this is all just hacked together thoughts near midnight.