memnus | For the probabilitists in the audience

Current Music: Semisonic - Secret Smile

Entry tags:

math,
random

For the probabilitists in the audience

I'd like your thoughts on a problem. (Here, all "randomly chosen" numbers are taken from the same continuous real distribution.) Suppose you have a set S of n randomly-chosen numbers, and a set D, initially empty. Begin by removing the largest number from S and placing it in D, and adding another random number to S. Now take the largest number from S that is smaller than the last number you chose, place it in D, and replace it with a random number. Repeat until you've taken the smallest number from S.

- What is the distribution of numbers in D?
- What is the expected value for the size of D?
- Bonus question (I know the answer to this one!): Why is Brian thinking about this problem?

I'm about to start in on a Monte Carlo estimator of the problem, but I'd like some theoretical input.

click

Threaded | Flat

Why is Brian thinking about this problem?

Oooh, I know! It's 'cuz he's a nerd, right? =o]

More seriously, my gut tells me that the expected value for the size of D is probably infinite. I was never all that good at this sort of math, though, and I'm also ridiculously out of practice. I'm still going with "infinite", though.

Unless I'm confused about probability (which is very possible), I have most of a combinatorial solution to this problem. The short answer is: don't bother, because it's bloody awful (except maybe for the purpose of proving Dan wrong with some slightly-less-awful estimation, but I'm not even quite to the point where that's possible yet).

The long answer follows.

First, the thing I might be confused about: you can simulate choosing $n$ numbers uniformly from an interval $[0,\alpha]$ by first choosing the largest of them (say $y$) with a pdf proportional to $x^{n-1}$, and then picking the rest of them uniformly from the interval $[0,y]$. If this is false, my entire argument is hosed, and it's the kind of thing I get confused about easily, but I'm pretty sure it's true.

Now, if the number you're adding to $S$ is ever larger than something you removed, you can totally ignore it, thereby reducing the problem to something like its instance with $n-1$ numbers. (I say "something like" because you're more likely to generate something that gets thrown away, since you're picking from a larger interval. This is a pain to fix, so I'll just worry about how long it takes to get rid of the first guy.)

This leads to the following way of rephrasing the problem:

1. Generate the largest number $y_0$ of $n$ things in $[0,1]$. Put it in $D$.
2. With probability $1-y_0$, be done (i.e., reduced to $n-1$ things). Otherwise, generate the largest number $y_1$ of $n$ things in $[0,y_0]$.
3. With probability $1-y_1$, be done. Otherwise, generate the largest number $y_2$ of $n$ things in $[0,y_1]$.

So we just need to evaluate the expected value $E_k$ of $1-y_k$ for each $k$. This is given by
$$
E_k=\frac{\int_{y_0=0}^1 \int_{y_1=0}^{y_0} \dots \int_{y_k=0}^{y_{k-1}} (1-y_k) y_k^{n-1}y_{k-1}^{n-1} \dots y_1^{n-1}y_0^{n-1} \, dy_k \, dy_{k-1} \dots dy_1 \, dy_0}{\int_{y_0=0}^1 \int_{y_1=0}^{y_0} \dots \int_{y_k=0}^{y_{k-1}} y_k^{n-1}y_{k-1}^{n-1} \dots y_1^{n-1}y_0^{n-1} \, dy_k \, dy_{k-1} \dots dy_1 \, dy_0}
$$
which ends up giving $1-\prod_{i=1}^{k} \frac{in}{in+1}$. So, if we've reached the $k^{\rm th}$ step, we'll continue to the $(k+1)^{\rm st}$ step with probability $\prod_{i=1}^k \frac{in}{in+1}$; you can use this to find the expected time it takes to eliminate a number from consideration.

In particular, the odds of continuing are always smaller than $\frac{n}{n+1}$, so the expected number of steps before we lose a number is definitely less than $n+1$.

(Unfortunately, once we lose a number this analysis doesn't quite apply. Intuitively, though, it should not-apply by some constant factor, and so in particular the number of steps required to finish should be finite. Making this rigorous is left as an exercise to the reader.)

I don't believe Dan. Suppose we set n=1, then it's pretty clear that we expect to collect the first element, and there is a 50% (I'm guessing) chance of replacing it with something smaller, and we can continue this argument ad infinitum and we expect to collect on average 2 elements.

So, the probability of adding an additional useful element is x (assuming we populate S with elements between 0 and 1). Unfortunately, I don't know how fast we go through x, so I can't really do the integral. I think you expect to through x at a rate of around x * (S-1)/S. This should allow for a diffeq to be solved or somesuch, but this is all just hacked together thoughts near midnight.

Threaded | Flat

For the probabilitists in the audience

no subject

(no subject)

(no subject)

almost...

Re: almost...

no subject

(no subject)

(no subject)