Home

Welcome to the the Great Internet Mersenne Prime Search! (The not-PC-only version ;)


This ftp site contains Ernst Mayer's C source code for performing Lucas-Lehmer tests and sieve-based trial-factoring of prime-exponent Mersenne numbers. In short, everything you need to search for Mersenne primes on your Intel, AMD or non-x86-CPU-based computer!


Quick Find Guide:


Recent News:


23 Jun 2014: Special thanks to Stephen Searle for doing significant amounts of analysis and debug of the code in this version. This release features the following major enhancements and changes:
02 Oct 2013 (Patched rev1 posted 09 May 2014): This features the following major enhancements and changes:
04 Feb 2013:
06 Nov 2009: Well, it took a full year longer than I had hoped, but a tarball of the Mlucas v3.0 beta code described in the entry below is finally available. This has SSE2 inline assembly support for 32-bit Windows and 32/64-bit Linux, but no PrimeNet support (yet) ... the latter will come later this year, if things go reasonably according to plan. A GUI will have to wait for at least another year. But the code is sufficiently ready for early adopters to run on their x86 machines (Win32, 32 and 64-bit Linux and MacOS ... code is most-optimized for the latter) and for builders, profilers and assembler experts to have a look and send me feedback and suggestions for improvement.
15 Sep 2008: Mlucas 3.0 used to verify 45th and 46th known Mersenne primes. Note that the verify runs by Tom Duell and Rob Giltrap of Sun Microsystems used a pre-beta version of Mlucas 3.0, scheduled for official release later this Fall. Key new features of the upcoming release [besides a radically overhauled header-file structure and many other code cleanups and bugfixes] include:

General Questions:

For general questions about the math behind the Lucas-Lehmer test, general primality testing or related topics, check out the Mersenne Forum.


STEP 1 - DOWNLOAD AND BUILD THE CODE

To do Lucas-Lehmer tests, you'll need to build the latest Mlucas C source code release. First get the release tarball, available in 2 different-zip-based forms:

(Code snapshot for both of the above dated 23 Jun 2014, except for the following patched files: 2 Jul: carry_gcc64.h file patched to restore SSE2 compatibility on pre-SSE4.2 systems; 19 Jul: Mlucas.c help menu fixes; 20 Jul: get_cpuid.c "CPU has AVX+FMA3 support?" check function fixed to correctly check for 2 lit bits in CPUID result.)

The build procedure is so simple, I use no makefile - let's illustrate using a multithreaded x86/SSE2 build under 64-bit linux. Note that there are some inline-assembler macros which use most of the 14 available general-purpose registers and as a consequence result in ran-out-of-registers errors in unthreaded mode, thus our default build mode is now multithreaded:

gcc -c -Os -m64 -DUSE_SSE2 -DUSE_THREADS *.c
rm -f rng*.o util.o qfloat.o
gcc -c -O1 -m64 -DUSE_SSE2 -DUSE_THREADS rng*.c util.c qfloat.c
gcc -m64 -o Mlucas *.o -lm -lpthread -lrt


We use -Os (targeting a small object code size) because that yields best performance for SSE2 builds on pre-AVX architectures. For AVX builds on Sandy/Ivy Bridge and AVX2 builds on Haswell/Broadwell you may get a smidge better performance using -O3 or -Ofast instead of -Os, but I have not seen a consistent benefit from doing so, and more importantly (depending on your precise compiler/runtime setup) these don't always play nice with the -mavx/-mavx2 flags (detailed below), which generally will yield a 3-5% speedup on the Sandy/Ivy Bridge and Haswell/Broadwell CPU families, respectively.

The rm command line and the gcc -O1 following it perform a rebuild of a trio of accuracy/optimization-sensitive files at a lower opt-level. None of these files is critical for performance, so it is recommended to always do this step even though many builds of all files with the higher opt level work just fine.
You should expect to see some compiler warnings, mostly of the "type-punned pointer", "signed/unsigned int", "unused variable" and "variable set but not used" (the latter with GCC 4.7 and later) varieties. The first of these is related to the quad-float emulation code used for high-precision double-float-const-initializations. (I try to keep on top of the latter kinds but with multiple build modes which which use various partially-overlapping subsets of variables, were I to set a goal of no such warnings, it would be a nearly full-time job and leave little time for actual new-code development). Other are mainly of the following kind:

[various]: warning: cast from pointer to integer of different size
twopmodq80.c: In function `twopmodq78_3WORD_DOUBLE':
twopmodq80.c:1032: warning: right shift count >= width of type
twopmodq80.c:1032: warning: left shift count is negative

These are similarly benign - the cast warnings are due to some array-alignment code which only needs the bottom few bits of a pointer, and the shift-count warnings are a result of compiler speculation-type optimizations.

The various other (including non-x86) build modes are all slight variants of the above example procedure:

Once you have successfully linked a binary, I suggest you first try a spot-check at some smallish FFT length, say

time ./Mlucas -fftlen 192 -iters 100 -radset 0 -nthread 2

This particular testcase should produce the following 100-iteration residues, with some platform-dependent variability in the roundoff errors :
100 iterations of M3888509 with FFT length 196608 = 192 K
Res64: 71E61322CCFB396C. AvgMaxErr = 0.226967076. MaxErr = 0.281250000. Program: E3.0x
Res mod 2^36     =          12028950892
Res mod 2^35 - 1 =          29259839105
Res mod 2^36 - 1 =          50741070790
[If the residues differ from these internally-pretabulated 100-iteration ones, the code will emit a visually-loud error message.]

Once you have a working binary you can play with #threads ... it is most efficient to gauge the resulting effect on throughput by running one or more of the self-test FFT-length ranges (e.g. small, medium, large) with a user-set thread count, as detailed below.


STEP 2 - PERFORMANCE-TUNE FOR YOUR MACHINE

For a complete list of Mlucas command line options, type 'Mlucas -h'.

After building the source code, the first thing that should be done is a set of self-tests to make sure the binary works properly on your system. During these self-tests, the code also collects various timing data which allow it to configure itself for optimal performance on your hardware. It does this by saving data about the optimal FFT radix combination at each FFT length tried in the self-test to a configuration file, named mlucas.cfg. Once this file has been generated, it will be read whenever the program is invoked to get the optimal-FFT data (specifically, the optimal set of radices into which to subdivide each FFT length) for the exponent currently being tested.

To perform the needed self-tests for a typical-user setup (which implies that you'll be either doing double-checking or first-time LL testing), first remove or rename any existing mlucas.cfg file from a previous code build/release in the run directory, then type

Mlucas -s m

This tells the program to perform a series of self-tests for FFT lengths in the 'medium' range, which currently (as of Fall 2013) means FFT lengths from 896K-3840K, which covers Mersenne numbers with exponents from ~17.5M-73M. Note that the code will automatically do the needed self-tests at a given FFT length if it fails to find an mlucas.cfg file, or fails to find an entry for the FFT length in question in that file (e.g. if you get a double-check assignment for an exponent < 20M, or a couple years from now, when exponents of first-time LL tests begin to exceed 73M) so in fact this step is optional. but I still recommend running the above set of self-tests under unloaded or constant-load conditions before starting work on any real assignments, so as to get the most-reliable optimal-FFT data for your machine, and to be able to identify and work around any anomalous timing data. (See example below for illustration of that).

Note: the default in automated self-test mode is to use as many threads as there are detected cores ... to use some number different from that you must add the -nthread flag to the command line. This "user custom" mode also requires you to specify the desired number of self-test iterations; you should use one of the 2 standard values, '-iters 100' and '-iters 1000' for which the code stores pretabulated results which it uses to validate (or reject) self-test results. 100 is nice for early-stage testing since the self-test will complete in roughly 1/10th the time, but 1000 is better once you have a good idea of optimal thread count on your system, because it yields a more-precise timing and is better at catching radix sets which may yield an unsafely high level of roundoff error for exponents near the upper limit of what the code allows for a given FFT length. Thus, to run the small, medium and large self-tests 2-threaded and with 100 iterations per individual subtest:

./Mlucas -s s -iters 100 -nthread 2
./Mlucas -s m -iters 100 -nthread 2
./Mlucas -s l -iters 100 -nthread 2

To follow that with a 4-threaded set of tests for purposes of timing comparison, first move the 2-threaded mlucas.cfg file under a different name, e.g.

mv mlucas.cfg mlucas.cfg.2thr
./Mlucas -s s -iters 100 -nthread 4
./Mlucas -s m -iters 100 -nthread 4
./Mlucas -s l -iters 100 -nthread 4
mv mlucas.cfg mlucas.cfg.4thr

Once you have determined the best thread count for your system, if this differs from the number of hardware cores, to get the program to use said #nthreads you must create a file 'nthreads.ini' in the same dir as you are running in, enter the desired #threads, and save the file.

Lastly, if you have run multiple-thread-count tests as above, prior to starting production LL-testing, link to the desired thread-specific .cfg file. For instance if you have 6 cores but find 4-threaded to yield the best timings using the above namings, 'ln -s mlucas.cfg.4thr mlucas.cfg'.

Additional Notes:

If you want to do the self-tests of the various available radix sets for one particular FFT length, enter

Mlucas -s {FFT length in K}

For instance, to test all the FFT radices available at 704K, enter

Mlucas -s 704

The above single-FFT-length self-test feature is particularly handy if the binary you are using throws errors for one or more particular FFT lengths, which interrupt the complete self-test before it has a chance to complete the configuration file. In that case, one must skip the offending FFT length and go on to the next-higher one, and in this fashion build a .cfg file one FFT length at a time. (Note that each such test appends to any existing mlucas.cfg file, so make sure to comment out or delete any older entries for a given FFT length after running any new timing tests, if you plan to do any actual "production" LL testing.

For SIMD-enabled (SSE2 and AVX) linux/GCC builds running on x86 platforms, the GCC compiler optimizer sometimes messes up the non-SIMD-enabled carry-step FFT radices, so during self-testing of a GCC build you may see an occasional error messages of this kind:

M36987271 Roundoff warning on iteration 1, maxerr = 14.000000000000 FATAL ERROR...Halting test of exponent 36987271

As long as the SIMD-enabled radix combinations work - i.e. you get a valid mlucas.cfg file with no "gaps", such errors are ignorable.

[Dec 2011 - Update: For GCC builds I have replaced the old "add and subtract rounding constant" trick for effecting fast double-precision nearest-int with a call to the compiler instrinsic lrint macro. This - even when building in scalar (non-SSE2) mode - inlines a fast SSE2-based DNINT, which is roughly as fast as the above add/subtract trick, and most importantly, is not subject to being "optimized away" by the compiler. Thus, one should no longer see the above kinds of errors in GCC builds; the only remaining caveat related to the fused DFT-pass/normalize-and-propagate-carries routines in question is that the ones which were in the past affected by the above problem are ones where the code has not been SSE2-optimized (based on an expected cost/benefit analysis), so will be relatively slow. That is not an issue, because that simply means those radices will never make into the "golden" cfg-file sets, thus they will never be used for actually primality tests.]

If you are running multiple copies of Mlucas, a copy of the mlucas.cfg file should be placed into each working directory (i.e.wherever you have a worktodo.ini file). Note that the program can run without this file, but with a proper configuration file (in particular one which was run under unloaded or constant-load conditions) it will run optimally at each runlength.

Format of the mlucas.cfg file:

What is contained in the configuration file? Well, let's let one speak for itself. The following mlucas.cfg file was generated on a 2.8 GHz AMD Opteron running RedHat 64-bit linux. I've italicized and colorized the comments to set them off from the actual optimal-FFT-radix data:


	#
	# mlucas.cfg file
	# Insert comments as desired in lines beginning with a # or // symbol.
	#
	# First non-comment line contains program version used to generate this mlucas.cfg file;
	3.0x
	#
	# Remaining non-comment lines contain data about the optimal FFT parameters at each runlength on the host platform.
	# Each line below contains an FFT length in units of Kdoubles (i.e. the number of 8-byte floats used to store the
	# LL test residues for the exponent being tested), the best timing achieved at that FFT length on the host platform
	# and the range of per-iteration worst-case roundoff errors encountered (these should not exceed 0.35 or so), and the
	# optimal set of complex-FFT radices (whose product divided by 512 equals the FFT length in Kdoubles) yielding that timing.
	#
	2048  sec/iter =    0.134  ROE[min,max] = [0.281250000, 0.343750000]  radices =  32 32 32 32  0  0  0  0  0  0  [Any text offset from the list-ending 0 by whitespace is ignored]
	2304  sec/iter =    0.148  ROE[min,max] = [0.242187500, 0.281250000]  radices =  36  8 16 16 16  0  0  0  0  0
	2560  sec/iter =    0.166  ROE[min,max] = [0.281250000, 0.312500000]  radices =  40  8 16 16 16  0  0  0  0  0
	2816  sec/iter =    0.188  ROE[min,max] = [0.328125000, 0.343750000]  radices =  44  8 16 16 16  0  0  0  0  0
	3072  sec/iter =    0.222  ROE[min,max] = [0.250000000, 0.250000000]  radices =  24 16 16 16 16  0  0  0  0  0
	3584  sec/iter =    0.264  ROE[min,max] = [0.281250000, 0.281250000]  radices =  28 16 16 16 16  0  0  0  0  0
	4096  sec/iter =    0.300  ROE[min,max] = [0.250000000, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
(Note that as of Jun 2014 the per-iteration timing data written to mlucas.cfg file have been changed from seconds to milliseconds, but that change in scaling is immaterial with respect to the notes below).

You are free to modify or append data to the right of the # signs in the .cfg file and to add or delete comment lines beginning with a # as desired. For instance, one useful thing is to add information about the specific build and platform at the top of the file. Any text to the right of the 0-terminated radices list for each FFT length is similarly ignored, whether it is preceded by a # or // or not. (But there must be a whitespace separator between the list-ending 0 and any following text).

One important thing to look for in a .cfg file generated on your local system is non-monotone timing entries in the sec/iter (seconds per iteration at the particular FFT length) data. for instance, consider the following snippet from an example mlucas.cfg file (to which I've added some boldface highlighting):

	1536  sec/iter =    0.225
	1664  sec/iter =    0.244
	1792  sec/iter =    0.253
	1920  sec/iter =    0.299
	2048  sec/iter =    0.284

We see that the per-iteration time for runlength 1920K is actually greater than that for the next-larger vector length that follows it. If you encounter such occurrences in the mlucas.cfg file generated by the self-test run on your system, don't worry about it -- when parsing the cfg file the program always "looks one FFT length beyond" the default one for the exponent in question. If the timing for the next-larger-available runlength is less than that for the default FFT length, the program will use the larger runlength. The only genuinely problematic case with this scheme is if both the default and next-larger FFT lengths are slower than an even larger runlength further down in the file, but this scenario is exceedingly rare. (If you do encounter it, please notify the author and in the meantime just let the run proceed).

Aside: This type of thing most often occurs for FFT lengths with non-power-of-2 leading radices (which are algorithmically less efficient than power-of-2 radices) just slightly less than a power-of-2 FFT length (e.g. 2048K = 221), and for FFT lengths involving a radix which is an odd prime greater than 7. It can also happen if for some reason the compiler does a relatively poorer job of optimization on a particular FFT radix, or if some FFT radix combinations happen to give better or worse memory-access and cache behavior on the system in question.


STEP 3 - RESERVE EXPONENTS FROM PRIMENET

Assuming your self-tests ran successfully, reserve a range of exponents from the GIMPS PrimeNet server. Here's the procedure (for less-experienced users, I suggest toggling between the PrimeNet links and my explanatory comments):

Each PrimeNet work assigment output line is in the form

{assignment type}={Unique assignment ID},{Mersenne exponent},{known to have no factors with base-2 logarithm less than},{p-1 factoring has/has-not been tried}

A pair of typical assignments returned by the server follows:

Assigment Explanation
Test=DDD21F2A0B252E499A9F9020E02FE232,48295213,69,0 M48295213 has not been previously LL-tested (otherwise the assignment would begin with "DoubleCheck=" instead of "Test="). The long hexadecimal string is a unique assignment ID generated by the PrimeNet v5 server as an anti-poaching measure. The ",69" indicates that M48295213 has been trial-factored to depth 269. and had a default amount of p-1 factoring effort done with no factors found, The 0 following the 69 indicates that p-1 still needs to be done, but Mlucas currently does not support p-1 factoring, so perform a first-time LL test of M48295213.
DoubleCheck=B83D23BF447184F586470457AD1E03AF,22831811,66,1
M22831811 has already had a first-time LL test performed, been trial-factored to a depth of 266
and has had p-1 factoring attempted, with no small factors found, so perform a second LL test of M22831811 in order to validate (or refute - in case of mismatching residues for the first-time test and the double-check a triple-check assignment would be generated by the server, whose format would however still read "Doublecheck") the results of the initial test.

Copy the Test=... or DoubleCheck=... lines returned by the server into the worktodo.ini file, which must be in the same directory as the Mlucas executable (or contain a symbolic link to it) and the mlucas.cfg file. If this file does not yet exist, create it. If this file already has some existing entries, append any new ones below them.

Note that Mlucas makes no distinction between first-time LL tests and double-checks - this distinction is only important to the Primenet server.

Most exponents handed out by the PrimeNet server have already been trial-factored to the recommended depth, so in most cases, no additional factoring effort is necessary. If you have exponents that require additional trial factoring, you'll need to do the trial factoring using Prime95 on a PC, as Mlucas currently has no trial factoring capability.

If you wish to test some non-server-assigned prime exponent, you can simple enter the raw exponent on a line by itself in the worktodo.ini file.


STEP 4 - LUCAS-LEHMER TESTING

On a Unix or Linux system, cd to the directory containing the Mlucas executable (or a link to it), the worktodo.ini and the mlucas.cfg file and type

nice ./Mlucas &

The program will run silently in background, leaving you free to do other things or to log out. Every 10000 iterations, the program writes a timing to the "p{exponent}.stat" file (which is automatically created for each exponent), and writes the current residue and all other data it needs to pick up at this point (in case of a crash or powerdown) to a pair of restart files, named "p{exponent}" and "q{exponent}." (The second is a backup, in the rare event the first is corrupt.) When the exponent finishes, the program writes the least significant 64 bits of the final residue (in hexadecimal form, just like Prime95) to the .stat and results.txt (master output) file. Any round-off or FFT convolution error warnings are written as they are detected both to the status and to the output file, thus preserving a record of them when the Lucas-Lehmer test of the current exponent is completed.

ADDING NEW EXPONENTS TO THE WORKTODO.INI FILE: You may add or modify ALL BUT THE FIRST EXPONENT (i.e. the current one) in the worktodo.ini file while the program is running. When the current exponent finishes, the program opens the file, deletes the first entry and, if there is another exponent on what was line 2 (and now is line 1), starts work on that one.


STEP 5 - SEND YOUR RESULTS TO PRIMENET

To report results (either after finishing a range, or as they come in), login to your PrimeNet account and then proceed to the Manual Test Results Check In. Paste the results you wish to report (i.e. the final line of the p*.stat file, or one or more lines of the results.txt file) into the large window immediately below.

If for some reason you need more time than the 180-day default to complete a particular assignment, go to the Manual Test Time Extension.page and enter the assignment there.


TRACKING YOUR CONTRIBUTION

You can track your overall progress (for both automated and manual testing work) at the PrimeNet server's producer page. Note that this does not include pre-v5-server manual test results. (That includes most of my GIMPS work, in case you were feeling personally slighted ;).


ALGORITHMIC Q & A