`vignettes/CombinatorialSampling.Rmd`

`CombinatorialSampling.Rmd`

This document covers topics in generating random samples of combinations/permutations. It is encouraged to read General Combinatorics first.

To illustrate this in `base R`

, let us consider getting 5 random combinations of the vector `1:20`

of length 10. How should we proceed?

A naive approach would be to generate all of the combinations using `combn`

and then call `sample`

:

```
naive <- function(v, m, n, s) {
allCombs <- combn(v, m)
set.seed(s)
allCombs[, sample(ncol(allCombs), n)]
}
fiveRndCombs <- naive(20, 10, 5, 42)
t(fiveRndCombs)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 5 7 8 9 11 13 14 17 18
[2,] 4 6 7 10 11 12 13 14 15 16
[3,] 1 3 4 7 9 11 14 15 17 20
[4,] 3 4 9 10 11 12 14 18 19 20
[5,] 2 4 5 6 8 11 12 16 17 20
```

This is okay for this small example (there are only `choose(20, 10) = 184756`

results), however what if we wanted to find one hundred thousand random combinations from the vector `1:100`

of length 20? Clearly, the approach above will not be feasible as there are far too many results to generate (`choose(100, 20) = 5.359834e+20`

). Furthermore, there are internal limitations on `sample`

. If we try to pass `choose(100, 20)`

, we will get an error:

```
sample(choose(100, 20), 5)
Error in sample.int(x, size, replace, prob) : invalid first argument
```

We could also try calling `sample(100, 20)`

a bunch of times and hope we don’t get duplicate combinations. This is neither promising nor elegant.

`RcppAlgos`

provides three functions: `comboSample`

, `permuteSample`

, and `comboGroupsSample`

for seamlessly attacking these types of problems. All functions provide the following:

- Easily generate random samples of combinations/permutations or partition of groups in parallel.
- You can pass a vector of specific indices or rely on the internal sampling functions. We call
`sample`

when the total number of results is small and for larger cases, the sampling is done in a very similar fashion to`urand.bigz`

from the`gmp`

package. - Consistent interface to their respective general functions (i.e.
`combo/permuteGeneral`

and`comboGroups`

) - Useful when we need a reproducible set of random combinations/permutations or partitions of groups.
- If the gmp library is needed, the
`seed`

parameter must be set in order to have reproducible results (*E.g.*`set.seed()`

) has no effect in these cases).

`comboSample`

and `permuteSample`

Let’s first look at the first problem above (i.e. getting 5 random combinations of the vector `1:20`

of length 10):

```
library(RcppAlgos)
set.seed(42)
comboSample(20, 10, n = 5)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 5 7 8 9 11 13 14 17 18
[2,] 4 6 7 10 11 12 13 14 15 16
[3,] 1 3 4 7 9 11 14 15 17 20
[4,] 3 4 9 10 11 12 14 18 19 20
[5,] 2 4 5 6 8 11 12 16 17 20
## Use the seed argument directly to produce the same output
comboSample(20, 10, n = 5, seed = 42)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 4 5 7 8 9 11 13 14 17 18
[2,] 4 6 7 10 11 12 13 14 15 16
[3,] 1 3 4 7 9 11 14 15 17 20
[4,] 3 4 9 10 11 12 14 18 19 20
[5,] 2 4 5 6 8 11 12 16 17 20
## fiveRndCombs produced above
identical(t(fiveRndCombs),
comboSample(20, 10, n = 5, seed = 42))
[1] TRUE
```

Just like with `comboGeneral`

and `permuteGeneral`

, we can explore results with repetition.

```
comboSample(10, 8, TRUE, n = 3, seed = 84)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 3 3 3 6 6 10 10 10
[2,] 1 3 3 4 4 7 9 10
[3,] 3 7 7 7 9 10 10 10
permuteSample(10, 8, TRUE, n = 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 8 7 8 10 1 2 7 6
[2,] 3 3 8 10 2 4 4 6
[3,] 3 7 8 4 2 9 10 4
comboSample(10, 12, freqs = 1:10, n = 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 1 2 3 4 5 5 6 8 8 9 10 10
[2,] 1 4 4 4 5 5 5 5 5 7 7 7
[3,] 2 3 4 5 5 6 7 7 7 7 7 7
permuteSample(10, 12, freqs = 1:10, n = 3, seed = 123)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
[1,] 4 6 7 7 1 10 8 7 8 7 4 6
[2,] 5 7 7 8 7 7 2 5 5 3 4 2
[3,] 10 6 1 10 8 5 3 9 7 2 9 3
```

`sampleVec`

We can also utilize `sampleVec`

to generate specific results.

```
## E.g. the below generates the 1st, 5th, 25th, 125th, and
## 625th lexicographical combinations
comboSample(10, 8, TRUE, sampleVec = c(1, 5, 25, 125, 625))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 5
[3,] 1 1 1 1 1 1 3 8
[4,] 1 1 1 1 1 3 6 9
[5,] 1 1 1 1 5 6 10 10
## Is the same as:
comboGeneral(10, 8, TRUE)[5^(0:4), ]
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
[1,] 1 1 1 1 1 1 1 1
[2,] 1 1 1 1 1 1 1 5
[3,] 1 1 1 1 1 1 3 8
[4,] 1 1 1 1 1 3 6 9
[5,] 1 1 1 1 5 6 10 10
```

`namedSample`

Have you ever wondered which lexicographical combinations/permutations are returned when sampling? No worries, simply set `namedSample = TRUE`

:

```
testInd <- permuteSample(30, 10, n = 3, seed = 100, namedSample = TRUE)
testInd
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
33554924331145 10 7 24 5 29 6 30 12 16 11
60218249947169 17 18 15 19 14 2 1 4 7 29
51084688265260 15 2 20 27 8 10 25 30 3 18
## Same output as above
permuteSample(30, 10, sampleVec = row.names(testInd))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 10 7 24 5 29 6 30 12 16 11
[2,] 17 18 15 19 14 2 1 4 7 29
[3,] 15 2 20 27 8 10 25 30 3 18
```

Just like the `General`

counterparts, the sampling functions utilize GMP to allow for exploration of combinations/permutations of large vectors where the total number of results is enormous. They also offer parallel options using `Parallel`

or `nThreads`

.

```
## Uses min(stdThreadMax() - 1, 5) threads (in this case)
permuteSample(500, 10, TRUE, n = 5, seed = 123, Parallel = TRUE)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 55 435 274 324 200 152 6 313 121 377
[2,] 196 166 331 154 443 329 155 233 354 442
[3,] 235 325 94 27 370 117 302 86 229 126
[4,] 284 104 464 104 207 127 117 9 390 414
[5,] 456 76 381 456 219 23 376 187 11 123
permuteSample(factor(state.abb), 15, n = 3, seed = 50, nThreads = 3)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] ME FL DE OK ND CA PA AL ID MO NM HI KY MT NJ
[2,] AZ CA AL CT ME SD ID SC OK NH HI TN ND IA MT
[3,] MD MO NC MT NH AL VA MA VT WV NJ NE MN MS MI
50 Levels: AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN ... WY
permuteCount(factor(state.abb), 15)
Big Integer ('bigz') :
[1] 2943352142120754524160000
```

The algorithms are incredibly efficient and offer tremendous gains over the naive approach above:

```
## the function "naive" is defined above
system.time(naive(25, 10, 5, 15))
user system elapsed
3.526 0.065 3.604
system.time(comboSample(25, 10, n = 5, seed = 15))
user system elapsed
0.002 0.000 0.001
```

Even when dealing with extremely large numbers, these algorithms are very fast. And using the parallel options have even greater effects than we saw with the general counterparts (typically around ~2-3 times faster with the general functions, whereas with the last example below with sampling we see a nearly 5 fold improvement).

```
## Lightning fast even with examples involving many results
system.time(comboSample(2500, 100, n = 5, seed = 15))
user system elapsed
0.002 0.000 0.002
## The total number of combinations has ~180 digits
gmp::log10.bigz(comboCount(2500, 100))
[1] 180.9525
## Still fast with larger samples
system.time(comboSample(2500, 100, n = 1e4, seed = 157))
user system elapsed
1.482 0.006 1.491
## Using Parallel/nThreads in these cases has an even greater effect
system.time(comboSample(2500, 100, n = 1e4, seed = 157, nThreads = 8))
user system elapsed
2.409 0.002 0.310
```

Again, just as with the general functions, you can pass a custom function to `combo/permuteSample`

using the `FUN`

argument.

```
permuteSample(5000, 1000, n = 3, seed = 101, FUN = sd)
[[1]]
[1] 1431.949
[[2]]
[1] 1446.859
[[3]]
[1] 1449.272
## Example using complex numbers
myCplx <- as.complex(1:100 + rep(c(-1, 1), 50) * 1i)
permuteSample(myCplx, 10, freqs = rep(1:5, 20),
n = 3, seed = 101, FUN = function(x) {
sqrt(sum(x))
})
[[1]]
[1] 24.83948+0i
[[2]]
[1] 20.9285+0.04778i
[[3]]
[1] 22.20379+0.09007i
```

`comboGroupsSample`

Just as we can generate random samples of combinations and permutations, we are also able to generate random samples of partitions of groups of equal size.

There are many problems that present in this manner. Below, we examine one involving playing cards.

Let’s say we have 4 players and each player is to have 3 cards a piece. Given that the deck is shuffled, the dealer then distrubutes 12 cards.

What possible hands can each player have?

See Creating A Deck Of Cards In R Without Using While And Double For Loop (Credit to @MichaelChirico)

```
cards <- c(2:10, "J", "Q", "K", "A")
suits <- c("♠", "♥", "♦", "♣")
deck <- paste0(rep(cards, length(suits)), #card values
rep(suits, each = length(cards))) #suits
set.seed(1738)
shuffled <- factor(deck[sample(52)], levels = deck)
## Here are 3 possibilities
comboGroupsSample(shuffled[1:12], numGroups = 4, n = 2, seed = 13)
Grp1 Grp1 Grp1 Grp2 Grp2 Grp2 Grp3 Grp3 Grp3 Grp4 Grp4 Grp4
[1,] A♠ 2♥ 6♠ 10♦ 10♥ J♣ 8♠ 10♣ 3♠ Q♦ 6♣ 8♦
[2,] A♠ 10♥ Q♦ 10♦ 2♥ 8♦ 8♠ J♣ 6♣ 10♣ 3♠ 6♠
52 Levels: 2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠ A♠ 2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ ... A♣
comboGroupsSample(shuffled[1:12], numGroups = 4, retType = "3Darray",
n = 2, seed = 13, namedSample = TRUE)
, , Grp1
[,1] [,2] [,3]
10939 A♠ 2♥ 6♠
3791 A♠ 10♥ Q♦
, , Grp2
[,1] [,2] [,3]
10939 10♦ 10♥ J♣
3791 10♦ 2♥ 8♦
, , Grp3
[,1] [,2] [,3]
10939 8♠ 10♣ 3♠
3791 8♠ J♣ 6♣
, , Grp4
[,1] [,2] [,3]
10939 Q♦ 6♣ 8♦
3791 10♣ 3♠ 6♠
52 Levels: 2♠ 3♠ 4♠ 5♠ 6♠ 7♠ 8♠ 9♠ 10♠ J♠ Q♠ K♠ A♠ 2♥ 3♥ 4♥ 5♥ 6♥ 7♥ 8♥ ... A♣
```