combinatoricsGeneral.Rd
Generate combinations or permutations of a vector with or without constraints.
Produce results in parallel using the Parallel
or nThreads
arguments. You can also apply each of the five compiled functions given by the argument constraintFun
in parallel.
Alternatively, the arguments lower
and upper
make it possible to generate combinations/permutations in chunks allowing for parallelization via the parallel package. This is convenient when you want to apply a custom function to the output in parallel.
Attack integer partition and general subset sum problems.
GMP support allows for exploration of combinations/permutations of vectors with many elements.
The output is in lexicographical order.
comboGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL) permuteGeneral(v, m = NULL, repetition = FALSE, freqs = NULL, lower = NULL, upper = NULL, constraintFun = NULL, comparisonFun = NULL, limitConstraints = NULL, keepResults = NULL, FUN = NULL, Parallel = FALSE, nThreads = NULL, tolerance = NULL)
v  Source vector. If 

m  Number of elements to choose. If 
repetition  Logical value indicating whether combinations/permutations should be with or without repetition. The default is 
freqs  A vector of frequencies used for producing all combinations/permutations of a multiset of 
lower  The lower bound. Combinations/permutations are generated lexicographically, thus utilizing this argument will determine which specific combination/permutation to start generating from (e.g. 
upper  The upper bound. Similar to If the output is constrained and In addition to the benefits listed for 
constraintFun  Function to be applied to the elements of 
comparisonFun  Comparison operator that will be used to compare When
In other words, the first comparison operator is applied to the first limit and the second operator is applied to the second limit. 
limitConstraints  This is the value(s) that will be used for comparison. Can be passed as a single value or a vector of two numerical values. The default is 
keepResults  A logical flag indicating if the result of E.g. The following are equivalent and will produce a \(4^{th}\) column of row sums:

FUN  Function to be applied to each combination/permutation. The default is 
Parallel  Logical value indicating whether combinations/permutations should be generated in parallel using \(n  1\) threads, where \(n\) is the maximum number of threads. The default is 
nThreads  Specific number of threads to be used. The default is 
tolerance  A numeric value greater than or equal to zero. This parameter is utilized when a constraint is applied on a numeric vector. The default value is 0 when it can be determined that whole values are being utilized, otherwise it is 
In general, a matrix is returned with each row containing a vector of length \(m\) or \(m + 1\) depending on the value of keepResults
. If FUN
is utilized, a list is returned.
Finding all combinations/permutations with constraints is optimized by organizing them in such a way that when constraintFun
is applied, a partially monotonic sequence is produced. Combinations/permutations are added successively, until a particular combination exceeds the given constraint value for a given constraint/comparison function combo. After this point, we can safely skip several combinations knowing that they will exceed the given constraint value.
When there are any negative values in v
and constraintFun = "prod"
, producing a monotonic set is nontrivial for the general case. As a result, performance will suffer as all combinations/permutations must be tested against the constraint criteria. Additionally, upper
will not have its normal effectiveness (i.e. it will only limit the number of rows after producing all combinations/permutations).
Parallel
and nThreads
will be ignored in the following cases:
When the output is constrained.
If the class of the vector passed is character
, raw
, and complex
(N.B. Rcpp::CharacterMatrix
is not thread safe). Alternatively, you can generate an indexing matrix in parallel.
If FUN
is utilized.
If either constraintFun
, comparisonFun
or limitConstraints
is NULL
or if the class of the vector passed is logical
, character
, raw
, and complex
, the constraint check will not be carried out. This is equivalent to simply finding all combinations/permutations of \(v\) choose \(m\).
The maximum number of combinations/permutations that can be generated at one time is \(2^{31}  1\). Utilizing lower
and upper
makes it possible to generate additional combinations/permutations.
Factor vectors are accepted. Class and level attributes are preserved.
Lexicographical ordering isn't guaranteed for permutations if lower
isn't supplied and the output is constrained.
If lower
is supplied and the output is constrained, the combinations/permutations that will be tested will be in the lexicographical range lower
to upper
or up to the total possible number of results if upper
is not given. See the second paragraph for the definition of upper
.
FUN
will be ignored if the constraint check is satisfied.
#> user system elapsed #> 0.001 0.000 0.001#> user system elapsed #> 0.016 0.009 0.026#> user system elapsed #> 0.01 0.00 0.01#> user system elapsed #> 0.007 0.003 0.010## permutations of the multiset (with or w/o setting m) : ## c(1,1,1,1,2,2,2,3,3,4) system.time(permuteGeneral(4, freqs = c(4,3,2,1)))#> user system elapsed #> 0.000 0.000 0.001#### Examples using "upper" and "lower": ## Generate some random data set.seed(1009) mySamp = sort(rnorm(75, 997, 23)) permuteCount(75, 10, freqs = rep(1:3, 25))#> Big Integer ('bigz') : #> [1] 4606842579278688000# >Big Integer ('bigz') : # >[1] 4606842576291495952 ## See specific range of permutations permuteGeneral(75, 10, freqs = rep(1:3, 25), lower = 1e12, upper = 1e12 + 10)#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] #> [1,] 1 2 2 10 22 18 69 57 47 25 #> [2,] 1 2 2 10 22 18 69 57 47 26 #> [3,] 1 2 2 10 22 18 69 57 47 27 #> [4,] 1 2 2 10 22 18 69 57 47 28 #> [5,] 1 2 2 10 22 18 69 57 47 29 #> [6,] 1 2 2 10 22 18 69 57 47 30 #> [7,] 1 2 2 10 22 18 69 57 47 31 #> [8,] 1 2 2 10 22 18 69 57 47 32 #> [9,] 1 2 2 10 22 18 69 57 47 33 #> [10,] 1 2 2 10 22 18 69 57 47 34 #> [11,] 1 2 2 10 22 18 69 57 47 35## Researcher only needs 1000 7tuples of mySamp ## such that the sum is greater than 7200. system.time(comboGeneral(mySamp, 7, FALSE, constraintFun = "sum", comparisonFun = ">", limitConstraints = 7200, upper = 1000))#> user system elapsed #> 0.000 0.000 0.001## Similarly, you can use "lower" to obtain the last rows. ## Generate the last 10 rows system.time(comboGeneral(mySamp, 7, lower = choose(75, 7)  9))#> user system elapsed #> 0.001 0.000 0.000## Or if you would like to generate a specific chunk, ## use both "lower" and "upper". E.g. Generate one ## million combinations starting with the 900,000,001 ## lexicographic combination. t1 = comboGeneral(mySamp, 7, lower = 9*10^8 + 1, upper = 9*10^8 + 10^6) ## class of the source vector is preserved class(comboGeneral(5,3)[1,]) == class(1:5)#> [1] TRUE#> [1] TRUE#> [1] TRUE## Using keepResults will add a column of results t2 = permuteGeneral(3,6,TRUE,constraintFun = "prod", keepResults = TRUE) t3 = comboGeneral(3,6,TRUE,constraintFun = "sum", comparisonFun = "==", limitConstraints = 8, keepResults = TRUE) ## Using multiple constraints: ## Get combinations such that the product ## is between 3000 and 4000 inclusive comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c(">=","<="), limitConstraints = c(3000, 4000), keepResults = TRUE)#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #> [1,] 1 1 5 5 5 5 5 3125 #> [2,] 1 2 3 4 5 5 5 3000 #> [3,] 1 2 3 5 5 5 5 3750 #> [4,] 1 2 4 4 4 5 5 3200 #> [5,] 1 2 4 4 5 5 5 4000 #> [6,] 1 3 3 3 5 5 5 3375 #> [7,] 1 3 3 4 4 5 5 3600 #> [8,] 1 3 4 4 4 4 4 3072 #> [9,] 1 3 4 4 4 4 5 3840 #> [10,] 2 2 2 3 5 5 5 3000 #> [11,] 2 2 2 4 4 5 5 3200 #> [12,] 2 2 2 4 5 5 5 4000 #> [13,] 2 2 3 3 4 5 5 3600 #> [14,] 2 2 3 4 4 4 4 3072 #> [15,] 2 2 3 4 4 4 5 3840 #> [16,] 2 3 3 3 3 4 5 3240 #> [17,] 2 3 3 3 4 4 4 3456 #> [18,] 3 3 3 3 3 3 5 3645 #> [19,] 3 3 3 3 3 4 4 3888## Or, get the combinations such that the ## product is less than or equal to 10 or ## greater than or equal to 40000 comboGeneral(5, 7, TRUE, constraintFun = "prod", comparisonFun = c("<=",">="), limitConstraints = c(10, 40000), keepResults = TRUE)#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] #> [1,] 1 1 1 1 1 1 1 1 #> [2,] 1 1 1 1 1 1 2 2 #> [3,] 1 1 1 1 1 1 3 3 #> [4,] 1 1 1 1 1 1 4 4 #> [5,] 1 1 1 1 1 1 5 5 #> [6,] 1 1 1 1 1 2 2 4 #> [7,] 1 1 1 1 1 2 3 6 #> [8,] 1 1 1 1 1 2 4 8 #> [9,] 1 1 1 1 1 2 5 10 #> [10,] 1 1 1 1 1 3 3 9 #> [11,] 1 1 1 1 2 2 2 8 #> [12,] 5 5 5 5 5 5 5 78125 #> [13,] 5 5 5 5 5 5 4 62500 #> [14,] 5 5 5 5 5 5 3 46875 #> [15,] 5 5 5 5 5 4 4 50000 #> [16,] 5 5 5 5 4 4 4 40000#### General subset sum problem set.seed(516781810) comboGeneral(runif(100, 0, 42), 5, constraintFun = "mean", comparisonFun = "==", limitConstraints = 30, tolerance = 0.0000002)#> [,1] [,2] [,3] [,4] [,5] #> [1,] 5.081371 27.64046 36.13466 39.61189 41.53162 #> [2,] 7.554598 31.01344 34.74310 37.39019 39.29867 #> [3,] 13.544197 29.86346 34.34605 36.11163 36.13466#### Integer Partitions comboGeneral(0:5, 5, TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 5)#> [,1] [,2] [,3] [,4] [,5] #> [1,] 0 0 0 0 5 #> [2,] 0 0 0 1 4 #> [3,] 0 0 0 2 3 #> [4,] 0 0 1 1 3 #> [5,] 0 0 1 2 2 #> [6,] 0 1 1 1 2 #> [7,] 1 1 1 1 1## Using FUN comboGeneral(10000, 5, lower = 20, upper = 22, FUN = function(x) { which(cummax(x) %% 2 == 1) })#> [[1]] #> [1] 1 3 #> #> [[2]] #> [1] 1 3 5 #> #> [[3]] #> [1] 1 3 #>if (FALSE) { ## Parallel example generating more than 2^31  1 combinations. library(parallel) numCores = detectCores()  1 system.time(mclapply(seq(1, comboCount(35, 15), 10086780), function(x) { a = comboGeneral(35, 15, lower = x, upper = x + 10086779) ## do something x }, mc.cores = numCores)) ## Find 13tuple combinations of 1:25 such ## that the mean is less than 10 system.time(myComb < comboGeneral(25, 13, FALSE, constraintFun = "mean", comparisonFun = "<", limitConstraints = 10)) ## Alternatively, you must generate all combinations and subsequently ## subset to obtain the combinations that meet the criteria system.time(myComb2 < combn(25, 13)) system.time(myCols < which(colMeans(myComb2) < 10)) system.time(myComb2 < myComb2[, myCols]) ## Any variation is much slower system.time(myComb2 < combn(25, 13)[,combn(25, 13, mean) < 10]) ## Test equality with myComb above all.equal(myComb, t(myComb2)) ## much faster using Parallel = TRUE system.time(permuteGeneral(15, 8, lower = 250000000, Parallel = TRUE)) system.time(permuteGeneral(15, 8, lower = 250000000)) system.time(comboGeneral(30, 10, Parallel = TRUE)) system.time(comboGeneral(30, 10)) ## Parallel also works when applying constraintFun solely system.time(comboGeneral(30, 10, Parallel = TRUE, constraintFun = "sum")) system.time(comboGeneral(30, 10, constraintFun = "sum")) ## Using Parallel argument only uses 1 minus the maximal # of threads. ## The below should run a bit faster as we are utilizing all threads. maxThreads = RcppAlgos::stdThreadMax() system.time(comboGeneral(30, 10, nThreads = maxThreads, constraintFun = "sum")) ## Depending on # of cores available, using Parallel with ## constraintFun is faster than rowSums or rowMeans alone combs = comboGeneral(30, 10) system.time(rowSums(combs)) ## Fun example... see stackoverflow: ## https://stackoverflow.com/q/22218640/4408538 system.time(permuteGeneral(seq(0L,100L,10L),8,TRUE, constraintFun = "sum", comparisonFun = "==", limitConstraints = 100)) }