primeSieve.Rd
Implementation of the segmented sieve of Eratosthenes with wheel factorization. Generates all prime numbers between bound1
and bound2
(if supplied) or all primes up to bound1
. See this stackoverflow post for an analysis on prime number generation efficiency in R: Generate a list of primes up to a certain number
The fundamental concepts of this algorithm are based off of the implementation by Kim Walisch found here: kimwalisch/primesieve.
primeSieve(bound1, bound2 = NULL, nThreads = NULL)
Positive integer or numeric value.
Positive integer or numeric value.
Specific number of threads to be used. The default is NULL
.
It does not matter which bound is larger as the resulting primes will be between min(bound1, bound2)
and max(bound1, bound2)
if bound2
is provided.
The maximum value for either of the bounds is \(2^{53} - 1\).
At the heart of this algorithm is the traditional sieve of Eratosthenes (i.e. given a prime \(p\), mark all multiples of \(p\) as composite), however instead of sieving the entire interval, we only consider small sub-intervals. The benefits of this method are two fold:
Reduction of the space complexity from \(O(n)\), for the traditional sieve, to \(O(\sqrt n)\)
Reduction of cache misses
The latter is of particular importance as cache memory is much more efficient and closer in proximity to the CPU than main memory. Reducing the size of the sieving interval allows for more effective utilization of the cache, which greatly impacts the overall efficiency.
Another optimization over the traditional sieve is the utilization of wheel factorization. With the traditional sieve of Eratosthenes, you typically check every odd index of your logical vector and if the value is true, you have found a prime. With wheel factorization using the first four primes (i.e. 2, 3, 5, and 7) to construct your wheel (i.e. 210 wheel), you only have to check indices of your logical vector that are coprime to 210 (i.e. the product of the first four primes). As an example, with \(n = 10000\) and a 210 wheel, you only have to check 2285 indices vs. 5000 with the classical implementation.
Returns an integer vector if max(bound1, bound2)
\(< 2^{31}\), or a numeric vector otherwise.
## Primes up to a thousand
primeSieve(100)
#> [1] 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97
## Primes between 42 and 17
primeSieve(42, 17)
#> [1] 17 19 23 29 31 37 41
## Equivalent to
primeSieve(17, 42)
#> [1] 17 19 23 29 31 37 41
## Primes up to one hundred million in no time
system.time(primeSieve(10^8))
#> user system elapsed
#> 0.080 0.003 0.084
## options(scipen = 50)
## Generate large primes over interval
system.time(myPs <- primeSieve(10^13+10^6, 10^13))
#> user system elapsed
#> 0.006 0.000 0.006
## Object created is small
object.size(myPs)
#> 267696 bytes
## Using nThreads
system.time(primeSieve(1e7, nThreads = 2))
#> user system elapsed
#> 0.009 0.000 0.005