배열 항목의 가중 무작위 샘플 * 대체 없음 *
Javascript / ECMAScript 6 특정 솔루션이 필요합니다.
각 개체에 대한 가중치 배열을 사용하여 개체 배열에서 임의의 샘플을 생성하고 싶습니다. 인구 목록에는 구성원 유형이 아닌 실제 인구 구성원이 포함됩니다. 하나의 샘플을 선택한 후에는 다시 선택할 수 없습니다.
제가 작업중인 것과 유사한 문제는 체스 토너먼트의 가능한 결과를 시뮬레이션하는 것입니다. 각 플레이어의 등급은 체중입니다. 플레이어는 토너먼트 당 한 번만 (1 위, 2 위 또는 3 위) 배치 할 수 있습니다.
To pick a likely list of the top 3 winners could look like:
let winners = wsample(chessPlayers, // population
playerRatings, // weights
3); // sample size
The weighted list may, or may not, be integer values. It could be floats like [0.2, 0.1, 0.7, 0.3]
, or it could be integers like [20, 10, 70, 30]
. The weights do not have to add up to a value that represents 100%.
Peter below gave me a good reference on a general algorithm, however It's not specific to JS: https://stackoverflow.com/a/62459274/7915759 it may be a good point of reference.
Solutions to the problem that rely on generating a second population list with each member copied weight number of times may not be a practical solution. Each weight in the weights array could be very high numbers, or they could be fractional; basically, any non-negative value.
Some additional questions:
- Is there already an
accumulate()
function available in JS? - Is there a
bisect()
type function in JS that does a binary search of sorted lists? - Are there any efficient and low memory footprint JS modules available with statistical functions that include solutions for the above?
답변
The following implementation selects k
out of n
elements, without replacement, with weighted probabilities, in O(n + k log n) by keeping the accumulated weights of the remaining elements in a sum heap:
function sample_without_replacement<T>(population: T[], weights: number[], sampleSize: number) {
let size = 1;
while (size < weights.length) {
size = size << 1;
}
// construct a sum heap for the weights
const root = 1;
const w = [...new Array(size) as number[], ...weights, 0];
for (let index = size - 1; index >= 1; index--) {
const leftChild = index << 1;
const rightChild = leftChild + 1;
w[index] = (w[leftChild] || 0) + (w[rightChild] || 0);
}
// retrieves an element with weight-index r
// from the part of the heap rooted at index
const retrieve = (r: number, index: number): T => {
if (index >= size) {
w[index] = 0;
return population[index - size];
}
const leftChild = index << 1;
const rightChild = leftChild + 1;
try {
if (r <= w[leftChild]) {
return retrieve(r, leftChild);
} else {
return retrieve(r - w[leftChild], rightChild);
}
} finally {
w[index] = w[leftChild] + w[rightChild];
}
}
// and now retrieve sampleSize random elements without replacement
const result: T[] = [];
for (let k = 0; k < sampleSize; k++) {
result.push(retrieve(Math.random() * w[root], root));
}
return result;
}
The code is written in TypeScript. You can transpile it to whatever version of EcmaScript you need in the TypeScript playground.
Test code:
const n = 1E7;
const k = n / 2;
const population: number[] = [];
const weight: number[] = [];
for (let i = 0; i < n; i++) {
population[i] = i;
weight[i] = i;
}
console.log(`sampling ${k} of ${n} elments without replacement`);
const sample = sample_without_replacement(population, weight, k);
console.log(sample.slice(0, 100)); // logging everything takes forever on some consoles
console.log("Done")
Executed in Chrome, this samples 5 000 000 out of 10 000 000 entries in about 10 seconds.
This is one approach, but not the most efficient.
The highest level function. It iterates k
times, calling wchoice()
each time. To remove the currently selected member from the population, I just set its weight to 0.
/**
* Produces a weighted sample from `population` of size `k` without replacement.
*
* @param {Object[]} population The population to select from.
* @param {number[]} weights The weighted values of the population.
* @param {number} k The size of the sample to return.
* @returns {[number[], Object[]]} An array of two arrays. The first holds the
* indices of the members in the sample, and
* the second holds the sample members.
*/
function wsample(population, weights, k) {
let sample = [];
let indices = [];
let index = 0;
let choice = null;
let acmwts = accumulate(weights);
for (let i=0; i < k; i++) {
[index, choice] = wchoice(population, acmwts, true);
sample.push(choice);
indices.push(index);
// The below updates the accumulated weights as if the member
// at `index` has a weight of 0, eliminating it from future draws.
// This portion could be optimized. See note below.
let ndecr = weights[index];
for (; index < acmwts.length; index++) {
acmwts[index] -= ndecr;
}
}
return [indices, sample];
}
The section of code above that updates the accumulated weights array is the point of inefficiency in the algorithm. Worst case it's O(n - ?)
to update on every pass. Another solution here follows a similar algorithm to this, but uses a heap to reduce the work needed to maintain the accumulated weights array at O(log n)
.
wsample()
calls wchoice()
which selects one member from the weighted list. wchoice()
generates an array of cumulative weights, generates a random number from 0 to the total sum of the weights (last item in the cumulative weights list). Then finds its insertion point in the cumulative weights; which is the winner:
/**
* Randomly selects a member of `population` weighting the probability each
* will be selected using `weights`. `accumulated` indicates whether `weights`
* is pre-accumulated, in which case it will skip its accumulation step.
*
* @param {Object[]} population The population to select from.
* @param {number[]} weights The weights of the population.
* @param {boolean} [accumulated] true if weights are pre-accumulated.
* Treated as false if not provided.
* @returns {[number, Object]} An array with the selected member's index and
* the member itself.
*/
function wchoice(population, weights, accumulated) {
let acm = (accumulated) ? weights : accumulate(weights);
let rnd = Math.random() * acm[acm.length - 1];
let idx = bisect_left(acm, rnd);
return [idx, population[idx]];
}
Here's a JS implementation I adapted from the binary search algorithm from https://en.wikipedia.org/wiki/Binary_search_algorithm
/**
* Finds the left insertion point for `target` in array `arr`. Uses a binary
* search algorithm.
*
* @param {number[]} arr A sorted ascending array.
* @param {number} target The target value.
* @returns {number} The index in `arr` where `target` can be inserted to
* preserve the order of the array.
*/
function bisect_left(arr, target) {
let n = arr.length;
let l = 0;
let r = n - 1;
while (l <= r) {
let m = Math.floor((l + r) / 2);
if (arr[m] < target) {
l = m + 1;
} else if (arr[m] >= target) {
r = m - 1;
}
}
return l;
}
I wasn't able to find an accumulator function ready-made for JS, so I wrote a simple one myself.
/**
* Generates an array of accumulated values for `numbers`.
* e.g.: [1, 5, 2, 1, 5] --> [1, 6, 8, 9, 14]
*
* @param {number[]} numbers The numbers to accumulate.
* @returns {number[]} An array of accumulated values.
*/
function accumulate(numbers) {
let accm = [];
let total = 0;
for (let n of numbers) {
total += n;
accm.push(total)
}
return accm;
}