Invisible link to canonical for Microformats

Mersenne Primes.


Searching for rare numbers with python.


26 oct 2024

Introduction

Update: I recorded myself reading 419 digits from the 52nd Mersenne prime for Say the prime.

The 52nd Mersenne prime was discovered on October 21, 2024 (Source: GIMPS). Despite intense effort, it took six years to surpass the 51st! It is the largest prime number ever found so far! A prime number is a number that is only divisible by 1 and itself. A Mersenne prime is a prime number of the form \(m = 2^p - 1\), where \(p\) itself is a prime number.

The first few Mersenne primes (source: Wikipedia) are:

3, 7, 31, 127, 8191, 131071, 524287, 2147483647

The specific form of Mersenne primes enables the use of a fast prime-testing algorithm called the Lucas-Lehmer test. Mersenne primes are closely connected to perfect numbers — numbers that are equal to the sum of their proper divisors. Perfect numbers take the form \(N = 2^{p-1}(2^p-1)\), where \(2^p - 1\) is a Mersenne prime.

Mersenne primes are rare, and the numbers grow exponentially as \(p\) increases, making them increasingly difficult to discover. The computational effort needed to verify whether a number of the form \(2^p - 1\) is prime also grows significantly, requiring sophisticated algorithms and powerful hardware, especially for large candidates. In this blog post, I will generate some Mersenne primes and explore how fast they grow.

Python setup

For this project, I use the standard Python library, SymPy for fast prime number calculations, and tabulate for formatting tables in a readable way. I use the following imports:

import time 
import sympy
import tabulate

To easily print tables, I created tabuliterate, which runs a function on each value of an iterable and optionally applies a condition.

def tabuliterate(fun, iterable, cond):
    table = []
    t0 = time.perf_counter() 
    for i, j in enumerate(iterable):
        n = fun(j)
        if cond is not None and not cond(n):
            continue
        t = time.perf_counter()
        table.append([i, j, t - t0, n])
        t0 = t
    return table

Checking for prime numbers

One can check if a candidate number \(n\) is prime by verifying if it is divisible by any smaller prime up to \(\sqrt{n}\). Here’s a simple python implementation of this:

def is_prime(n):
    if n < 2:
        return False
    for i in range(2, int(n**0.5) + 1):
        if n % i == 0:
            return False
    return True

Let’s use this to print all primes in the range (1, 50).

%%time
prime_table = tabuliterate(is_prime, range(1, 50), cond=lambda x: x)
print(
    tabulate.tabulate(
        prime_table,
        headers=["i", "prime(i)", "t (s)", "is_prime"],
        tablefmt="github"))

The tabuliterate function runs is_prime for each value in the range, and saves the results if is_prime returns True. I then use tabulate to print a well-formatted, blog-ready Markdown table. To better understand performance, I use time.perf_counter to measure the time between iterations (in column t (s)). Additionally, I use the Jupyter magic %%time to print timing information for the entire cell. While these are not substitutes for proper profiling, they provide sufficient insight for this blog, optimization is not the main focus.

CPU times: user 91 μs, sys: 7 μs, total: 98 μs
Wall time: 101 μs
i prime(i) t (s) is_prime
1 2 1.4666e-05 True
2 3 2.45901e-06 True
3 5 3.66601e-06 True
4 7 3.45899e-06 True
5 11 5.83302e-06 True
6 13 3.208e-06 True
7 17 4.959e-06 True
8 19 2.541e-06 True
9 23 5.58398e-06 True
10 29 6.916e-06 True
11 31 2.79202e-06 True
12 37 7.125e-06 True
13 41 5.04201e-06 True
14 43 3.375e-06 True
15 47 4.95798e-06 True
16 53 7.04202e-06 True
17 59 8.41598e-06 True
18 61 3.667e-06 True
19 67 7.208e-06 True
20 71 5.04201e-06 True
21 73 3.54199e-06 True
22 79 7.125e-06 True
23 83 5.37501e-06 True
24 89 6.666e-06 True
25 97 8.542e-06 True

Now let’s try a few bigger numbers, including powers of ten and some known Mersenne primes:

%%time 
candidates = 1000, 10000, 100000, 1000000, 131071, 524287, 2147483647
is_prime_table = tabuliterate(is_prime, candidates)
print(tabulate.tabulate(
    is_prime_table,
    headers=["i", "n", "t (s)", "is_prime"],
    tablefmt="github"))
CPU times: user 8.42 ms, sys: 367 μs, total: 8.79 ms
Wall time: 8.92 ms
i n t (s) is_prime
0 1000 1.1542e-05 False
1 10000 4.29098e-06 False
2 100000 3.29202e-06 False
3 1000000 2.29198e-06 False
4 131071 4.9333e-05 True
5 524287 0.000114417 True
6 2147483647 0.00870742 True

Using this very basic and unoptimized prime checker, we can still test prime numbers relatively quickly. However, by the time we reach the 9th Mersenne prime, the computational effort required is no longer feasible for this method without significant optimization.

Generating Prime Numbers

Before attempting to find Mersenne primes, we first need to generate prime numbers for the Mersenne exponents. Specifically, we need a function to generate the nth prime: p = prime(n). We could use the is_prime function and loop over all possible values of n, but as Mersenne primes grow very quickly, this approach will soon face performance limitations. The is_prime function doesn’t store previously checked values, leading to redundant calculations.

To overcome this, we can use a cache dictionary {n: prime(n)} to store previously computed primes, allowing us to generate the nth prime more efficiently by avoiding unnecessary recomputation. Here is an implementation to compute the n-th prime:

def nth_prime(n, cache):
    if n in cache: return cache[n]
    c, num = max(cache.keys()), cache[max(cache.keys())] + 2
    while c < n:
        is_prime = True
        limit = int(num**0.5) + 1
        for p in cache.values():
            if p > limit: break
            if num % p == 0:
                is_prime = False
                break
        if is_prime: c, cache[c + 1] = c + 1, num
        num += 2
    return cache[n]

Now let’s generate some prime numbers:

%%time
cache = {1: 2, 2: 3, 3: 5}
nth_primes_table = tabuliterate(
    lambda x: nth_prime(x, cache=cache),
    (10**i for i in range(3, 7))
)
print(
    tabulate.tabulate(
        nth_primes_table,
        headers=["i", "n", "t (s)", "prime(n)"],
        tablefmt="github"))
CPU times: user 20.6 s, sys: 124 ms, total: 20.8 s
Wall time: 20.8 s
i n t (s) prime(n)
0 1000 0.00183408 7907
1 10000 0.0355349 104717
2 100000 0.770876 1299653
3 1000000 19.9434 15485837

We are able to generate the millionth prime, but the approach becomes impractical for larger primes. Notably, the 8th Mersenne prime is already 2,147,483,647, making this method unsuitable without significant optimization for larger primes.

Finding Mersenne primes

Instead of trying to reinvent a faster and smarter prime generator and tester, here is the same test implemented with SymPy:

%%time
sympy_nth_primes_table = tabuliterate(
    lambda x: sympy.prime,
    (10**i for i in range(3, 7))
)
print(tabulate.tabulate(
    sympy_nth_primes_table, 
    headers=["i", "n", "t (s)", "prime(n)"],
    tablefmt="github"))
CPU times: user 47 μs, sys: 1 μs, total: 48 μs
Wall time: 57 μs
i n t (s) prime(n)
0 1000 0.0163215 7919
1 10000 0.00630858 104729
2 100000 0.0111348 1299709
3 1000000 0.0328108 15485863

This approach is significantly faster. Now, let’s try finding some Mersenne primes! To do this, we use sympy.isprime as a filter to determine if numbers of the form are prime:

%%time
N=600
mersenne_primes_table = tabuliterate(
    lambda x: 2**x-1,
    (sympy.prime(n) for n in range(1, N)),
    sympy.isprime
)
CPU times: user 1min 14s, sys: 574 ms, total: 1min 14s
Wall time: 1min 14s

It takes a little over one minute to test the first 600 primes. However, we can optimize this further by leveraging SymPy’s `is_mersenne_prime, which is specifically designed for testing Mersenne primes:

%%time
N=600
mersenne_primes_table = tabuliterate(
    lambda x: 2**x-1,
    (sympy.prime(n) for n in range(1, N)),
    sympy.ntheory.primetest.is_mersenne_prime
)
CPU times: user 1min 7s, sys: 94.7 ms, total: 1min 7s
Wall time: 1min 7s
i n p(n) t (s) D(m) m(p)
1 0 2 4.0125e-05 1 3
2 1 3 9.33401e-06 1 7
3 2 5 6.666e-06 2 31
4 3 7 5.24998e-06 3 127
5 5 13 1.8209e-05 4 8191
6 6 17 0.0152827 6 131071
7 7 19 0.00496579 6 524287
8 10 31 0.0177748 10 2147483
9 17 61 0.0320199 19 2305843…3693951
10 23 89 0.02063 27 6189700…9562111
11 27 107 0.0131295 33 1622592…0288127
12 30 127 0.00890125 39 1701411…4105727
13 97 521 0.241578 157 6864797…5057151
14 110 607 0.0484024 183 5311379…1728127
15 206 1279 0.405773 386 1040793…8729087
16 327 2203 0.534715 664 1475979…7771007
17 338 2281 0.0510051 687 4460875…2836351
18 454 3217 0.548838 969 2591170…9315071
19 582 4253 0.639087 1281 1907970…0484991
20 601 4423 0.0981187 1332 2855425…8580607
21 1195 9689 3.24355 2917 4782202…5754111
22 1225 9941 0.17318 2993 3460882…9463551
23 1356 11213 0.750064 3376 2814112…6392191
24 2253 19937 5.40748 6002 4315424…8041471
25 2434 21701 1.1508 6533 4486791…1882751
26 2590 23209 0.983179 6987 4028741…9264511
27 4623 44497 13.4968 13395 8545098…1228671
28 8383 86243 27.3762 25962 5369279…3438207

We managed to test up to and found the first 28 Mersenne primes in about a minute!

Conclusion

Mersenne primes are fascinating mathematical objects with deep connections to number theory. Discovering them requires both computational ingenuity and efficient algorithms, as they grow exponentially large and exceedingly rare. I was amazed I could generate a 25962 digit Mersenne prime on my laptop in a minute. However, noticing that the last number took more than twice as much as the previous one, and the 52 is another 24 steps away… the resent discovery really is something.


Related