Ernst Home

Welcome to the the Great Internet Mersenne Prime Search! (The not-PC-only version ;)


This ftp site contains Ernst Mayer's C source code for performing Lucas-Lehmer tests and sieve-based trial-factoring (TF) of prime-exponent Mersenne numbers. (Although see notes below about the GPU clients now being preferable for sieving.) In short, everything you need to search for Mersenne primes on your Intel, AMD or non-x86-CPU-based computer!

Mlucas is an open-source program for primality testing of Mersenne numbers in search of a world-record prime. You may use it to test any suitable number as you wish, but it is preferable that you do so in a coordinated fashion, as part of the Great Internet Mersenne Prime Search (GIMPS). Note that Mlucas is not (yet) as efficient as the main GIMPS client, George Woltman's Prime95 program (a.k.a. mprime for the linux version), but that program is not truly open-source, and requires the user to abide by the prize-sharing rules set by its author, should a user be lucky enough to find a new prime eligible for one of the monetary prizes offered by the Electronic Frontier Foundation. Prime95 is also only available for platforms based on the x86 processor architecture.


Quick Find Guide:


Recent News:


15 Jun 2017: v17.0 released: This supports Intel's next-gen AVX-512 vector arithmetic which will appear in the consumer market in form of the new Core i9 series of processors. I did all the associated code development using a crowdfunded Intel Knights Landing manycore workstation, which means the code uses only the basic AVX-512F Foundation instruction subset - the additional instruction subsets supported in later iterations (e.g. Skylake Xeon and Core i9) offer little performance gain as far as the vector FFT code at the heart of Mlucas is concerned.

Special thanks to David Stanfill for hosting/tech-supporting the aforementioned KNL, as well as generously providing access to a 32-core AVX2-running Xeon and one of the new AMD Ryzen octocore systems for build and testing; thanks also to Mark Rose for providing a pair of Primenet-server Python scripts which I used as templates for creating my Mlucas-customized py-script for automated assignments management, and to Gord Palameta for beta-testing the release on one of the new AVX-512-capable Google Compute Engine cloud server nodes.

Additional changes in 17.0:

January 2016: 49th Known Mersenne prime verified by Serge Batalov on an Amazon Web Services EC2 node using a multithreaded AVX2-based build of Mlucas.


2015: Mlucas 14.1 becomes an official Debian package. Thanks to Alex Vong for the initiative here. https://packages.debian.org/stretch/math/mlucas

Descriptions of recent previous code releases:



General Questions:

For general questions about the math behind the Lucas-Lehmer test, general primality testing or related topics (and also no small number of unrelated topics, such as this one following in the literary-humor footsteps of Ambrose Bierce), check out the Mersenne Forum.

For discussions specifically about Mlucas at the above venue, see the Mlucas subforum there.


Windows Users:

Mlucas does not support build using Windows tools, but windows users can download a pair of popular freeware packages to provide themselves with a Linux/GCC-like environment in which to build and run the code. (Thanks to Mersenne-forum member Anders Höglund for the setup here.)

First, you'll need a suitable archiver utility which handles various common Linux compression formats along with the Linux 'tar' (tape-archive, its name a historic artifact) command. I use the freeware 7zip package, which can handle most linux compression formats including .xz, .7z and .bz2 . Download to your C-drive and run the extractor .exe .

Next, download the msys2-x86_64 package, which provides both the needed Linux emulation environment and the underlying MINGW compiler-tools installation. After downloading to c:, click to run the self-extractor .

Note that in the ensuing package-install and configuration steps, you will need to be connected to the Internet, and will need to quit and restart MSYS2 several times. I restart via the Start Menu → All Programs → MSYS2 64bit. When I press on the last category, a dropdown menu appears with these 3 items, of which you want the bottom-most, bold-highligted one:

MSYS2 MinGW 32-bit
MSYS2 MinGW 64-bit
MSYS2 MSYS

From the resulting command shell (and with a working internet connection), run these package-management commands in MSYS2, replying 'Y' to any do-you-wish-to-go-ahead-and-install prompts:
pacman -Syu (then exit & restart MSYS2)
pacman -Su (then exit & restart MSYS2)
Lastly, install the compiler and python-scripting tools:
pacman -S mingw-w64-x86_64-gcc
pacman -S mingw-w64-x86_64-python2
...and do one small manual edit, of the c:\msys64\etc\profile (text) file, to add the bolded snippet, including the leading :-separator, to the following line:
MSYS2_PATH="/usr/local/bin:/usr/bin:/bin:/mingw64/bin"
Then a final exit & restart of MSYS2 and you are ready to go and everything is located inside the C:\msys64 folder, there is no additional c:\mingw64 folder installation needed like in MSYS.

To test that everything is set up properly, type 'which gcc' in your just-opened shell. That should point to /c/mingw/bin/gcc.exe, and 'gcc -v' should show the version of the compiler in your installation on the final line of screen output.


STEP 1 - DOWNLOAD AND BUILD THE CODE

To do Lucas-Lehmer tests, you'll need to build the latest Mlucas C source code release. First get the release tarball, available in 2 different-zip-based forms - Windows users should have already downloaded the above-linked 7zip freeware archiver utility, so should just use that in conjunction with the smaller xz-compressed tarchive:


Windows Users: Assuming you successfully installed MSYS2 as described above, everything below should work for you, except that the MSYS2/MINGW emulation environment does not support multithreaded builds. Thus just select the appropriate SIMD vector-mode for your x86 processor using the /proc/cpuinfo-based procedure described below (or none if non-x86), and omit -DUSE_THREADS from your compile statement.

Once you have the Mlucas tarball downloaded and unzipped to your c-drive, you cd to it in the MSYS2 shell, via 'cd /c/mlucas_v17/src', where '/c' is MSYS2's syntax for the C: drive. (Similarly, /e points to the removable-media mount point, which is handy for unnetworked 'sneakernetting' of files between your MSYS2-rendered Windows filesystem and a USB flash drive). Type 'ls' to list the files in your src-subdirectory
Determining the SIMD build mode using the /proc/cpuinfo file: To see if your x86 CPU (either Intel or AMD) supports single-instruction-multiple-data (SIMD) vector arithmetic (and if so, what the highest-supported relevant SIMD level is), type the following regular-expression-search commands, stopping as soon as you get a hit (a line of text containing the substring-being-searched-for will be echoed to stdout):
grep avx512 /proc/cpuinfo
grep avx2 /proc/cpuinfo
grep avx /proc/cpuinfo
grep sse2 /proc/cpuinfo
Whichever of these gave you the first hit, you will use -DUSE_[capitalized search substring] in your compile command line, e.g. if grepping for 'avx2' gave you the first hit, you use -DUSE_AVX2 in your compile command.

Mac OS X has no /proc/cpuinfo file, so Mac users will need to [Apple Icon] → About This Mac, then compare the processor type displayed in the resulting dialog box against the following Wikipedia entries:

CPUs with AVX-512                   • CPUs with AVX2                   • CPUs with AVX                   • CPUs with SSE2


Building: The build procedure is so simple, there is little point in the extra script-infrastructure and maintenance work needed by the usual linux ./configure-then-make procedure - let's illustrate using a multithreaded x86/SSE2 build under 64-bit Linux. (Again, Windows users must omit -DUSE_THREADS from their compile statement.) I suggest creating an 'obj' subdir within the src-directory (or specific-build-mode-named object-subdirs if you want to try multiple build modes, say obj_avx2 and obj_avx512 on newer Intel x86 systems), the cd'ing into the obj-dir and doing like so (again, this example is specifically for an SSE2 vector-SIMD-arithmetic build):

gcc -c -O3 -DUSE_SSE2 -DUSE_THREADS ../*.c >& build.log
grep -i error build.log
[under Windows, just open build.log in a text editor and do case-insensitive search for the string 'error']
[Assuming above grep comes up empty] gcc -o Mlucas *.o -lm -lpthread -lrt


The various other (including non-x86) build modes are all slight variants of the above example procedure:
Those of you masochistic enough to delve deeper into the build.log should expect to see some compiler warnings, mostly of the "type-punned pointer", "signed/unsigned int", "unused variable" and "variable set but not used" (the latter with GCC 4.7 and later) varieties. (I try to keep on top of the latter kinds but with multiple build modes which which use various partially-overlapping subsets of variables, were I to set a goal of no such warnings, it would be a nearly full-time job and leave little time for actual new-code development). The first of these is related to the quad-float emulation code used for high-precision double-float-const-initializations. Other are mainly of the following kind:

[various]: warning: cast from pointer to integer of different size
twopmodq80.c: In function `twopmodq78_3WORD_DOUBLE':
twopmodq80.c:1032: warning: right shift count >= width of type
twopmodq80.c:1032: warning: left shift count is negative

These are similarly benign - the cast warnings are due to some array-alignment code which only needs the bottom few bits of a pointer, and the shift-count warnings are a result of compiler speculation-type optimizations.

Once you have successfully linked a binary, I suggest you first try a spot-check at some smallish FFT length, say

./Mlucas -fftlen 192 -iters 100 -radset 0

This particular testcase should produce the following 100-iteration residues, with some platform-dependent variability in the roundoff errors :
100 iterations of M3888517 with FFT length 196608 = 192 K
Res64: 579D593FCE0707B2. AvgMaxErr = 0.260239955. MaxErr = 0.343750000. Program: E14.1
Res mod 2^36     =          67881076658
Res mod 2^35 - 1 =          21674900403
Res mod 2^36 - 1 =          42893438228
[If the residues differ from these internally-pretabulated 100-iteration ones, the code will emit a visually-loud error message.]
If that works, try rerunning the same case, now with 2 threads rather than the default single-threaded:

./Mlucas -fftlen 192 -iters 100 -radset 0 -nthread 2

Intel multicore-CPU users should see a nearly 2-fold speedup running this way, but AMD users won't. That's because '-nthread 2' really translates to 'run 2-threaded, with thread affinities set to logical CPU cores 0 and 1'. By 'logical cores' we mean the multiple (typically 2, but sometimes more) 'virtual cores' mapped to each physical CPU core in modern 'hyperthreaded' (Intel's term) CPU architectures. The Intel numbering system here is that on a system with n physical cores, physical CPU core 0 maps to logical cores 0 and n; physical CPU core 1 maps to logical cores 1 and n+1, etc. AMD uses a different logical-core numbering convention than Intel, whereby physical CPU core 0 maps to logical cores 0 and 1; physical CPU core 1 maps to logical cores 2 and 3, and so forth. The Intel-specificity of the Mlucas -nthread option is one reason is deprecated (still supported but recommended-against) in v17 and beyond; another is that it does not permit, e.g., setting the processor affinity of one job to physical core 0, a second to physical core 1, etc.

For these reasons v17 introduces a new and much-more-flexible flag '-cpu', which accepts any mix of comma-separated individual core indices and core-index ranges of form low:high and low:high:stride, where if stride is omitted it defaults to 1, and if high is also omitted, it means "run 1 thread on logical core [low]". Thus for our Intel user, -nthread 2 is equivalent to -cpu 0:1, but now our user can run a second 2-threaded job using -cpu 2:3 and be sure that the two runs are not competing for the same CPU cores. Our AMD user will similarly see no runtime benefit from replacing -nthread 2 with -cpu 0:1 (since both have the same effect of overloading a single physical CPU core), but will find that -cpu 0:2 gives the expected 2-threaded speedup.



STEP 2 - PERFORMANCE-TUNE FOR YOUR MACHINE

[For a complete list of Mlucas command line options, type 'Mlucas -h'; note the topical help-submenu options, new in v17.]

After building the source code, the first thing that should be done is a set of self-tests to make sure the binary works properly on your system. During these self-tests, the code also collects various timing data which allow it to configure itself for optimal performance on your hardware. It does this by saving data about the optimal FFT radix combination at each FFT length tried in the self-test to a configuration file, named mlucas.cfg. Once this file has been generated, it will be read whenever the program is invoked to get the optimal-FFT data (specifically, the optimal set of radices into which to subdivide each FFT length) for the exponent currently being tested.

To perform the needed self-tests for a typical-user setup (which implies that you'll be either doing double-checking or first-time LL testing), first remove or rename any existing mlucas.cfg file from a previous code build/release in the run directory, then type

Mlucas -s m >& selftest.log

This tells the program to perform a series of self-tests for FFT lengths in the 'medium' range, which currently means FFT lengths from 1024K-7680K, covering Mersenne numbers with exponents from 20M - 143M. You should run the self-tests under unloaded or constant-load conditions before starting work on any real assignments, so as to get the most-reliable optimal-FFT data for your machine, and to be able to identify and work around any anomalous timing data. (See example below for illustration of that). This may take a while, especially in single-threaded mode; you can monitor progress of the process by opening the mlucas.cfg file in an editor and watching the various-FFT-length entries get added as each set of tests at a given FFT length completes. When done, please check the resulting selftest.log file for error messages. You should expect to see a few "***** Excessive level of roundoff error detected - this radix set will not be used. *****" messages, but a whole lot of such, or residue-mismatch or other kinds of errors means that something has likely gone awry in your build. This can be something as mundane as the compiler using unsafe optimizations for one or more FFT-radix functions, or something more serious. In such cases, please contact me, the program author, and attach zipped copies of your build.log and selftest.log, along with information about your compiler version and compute platform (CPU and OS). Users who just want to start doing GIMPS work in the default one-single-threaded-per-physical-CPU mode should skip down to the "Format of the mlucas.cfg file" subsection. Those who want to see if multithreaded running offers any gain on their system should read the following subsection.


Advanced Users: Note that the default in automated self-test mode is the same as for production run mode: to use a single thread running on a single physical core, using 100-iteration timing runs of the various FFT lengths and radix combinations at each length. You may also explicitly specify the desired number of self-test iterations, but for this to produce a .cfg file you must use one of the 3 standard values, '-iters 100', '-iters 1000' or '-iters 10000' for which the code stores pretabulated results which it uses to validate (or reject) self-test results. 100 is nice for 1- and perhaps 2-thread testing, but on fast systems with >= 2 threads, 1000 is better, because it yields a more-precise timing and is better at catching radix sets which may yield an unsafely high level of roundoff error for exponents near the upper limit of what the code allows for a given FFT length. Thus, to run the small, medium and large self-tests 2-threaded and with 1000 iterations per individual subtest, first save the 1-threaded mlucas.cfg file under a different name, e.g. mlucas.cfg.1thr. Then, on Intel systems:
./Mlucas -s m -iters 1000 -nthread 2
or, equivalently:
./Mlucas -s m -iters 1000 -cpu 0:1
On systems using a different core-numbering system than Intel you will need to modify the core indices in multithread runs suitably, e.g. on AMD our 2-threaded timings should use
./Mlucas -s m -iters 1000 -cpu 0:2
On systems other than Intel and AMD a quick single-case timing experiment should suffice to reveal whether the physical-core-numbering scheme is like that of Intel or AMD, or perhaps something else. Compare the runtimes for these:

./Mlucas -fftlen 192 -iters 100 -radset 0 [This is your 1-thread baseline timing]
./Mlucas -fftlen 192 -iters 100 -radset 0 -cpu 0:1
./Mlucas -fftlen 192 -iters 100 -radset 0 -cpu 0:2

If -cpu 0:1 gives a clearly better timing - in the sense that the runtimes are on average < 0.5x the 1-thread ones - than 1-thread and -cpu 0:2, use the Intel core-numbering scheme. If -cpu 0:2 gives the clear best timing, use the AMD numbering scheme. If neither of the 2-threaded runs gives a timing better than (say) 0.6x the 1-thread timing, you should stick to single-threaded running, 1 job per physical core.

Once your 2-threaded self-tests complete, for the total system throughput to beat the simple one-single-threaded-job-per-physical-CPU, the per-iteration timings in the 2-thread .cfg file need to be on average half those in the single-thread .cfg file. If they are not, it's probably best to just go single-threaded. Rename the 2-threaded mlucas.cfg file mlucas.cfg.2thr, and either remove the .1thr extension you added to the 1-thread .cfg file, or place a soft-link to that one in each of your production run directories, under the alias mlucas.cfg. (E.g. 'mkdir run0 && cd run0 && ln -s ../mlucas.cfg.1thr mlucas.cfg'.)

To follow the 2-threaded self-test with a 4-threaded one for purposes of timing comparison, first move the 2-threaded mlucas.cfg file under a different name, e.g. mlucas.cfg.2thr. Then on Intel:

./Mlucas -s m -iters 1000 -cpu 0:3

or on AMD, where the following 3 -cpu argument sets are all equivalent, and illustrate the various available syntaxes:

./Mlucas -s m -iters 1000 -cpu 0,2,4,6
./Mlucas -s m -iters 1000 -cpu 0:6:2
./Mlucas -s m -iters 1000 -cpu 0:7:2 [think of C loop of form for(i = 0; i <= 7; i += 2)]

And don't forget to

mv mlucas.cfg mlucas.cfg.4thr

For 4-threaded to give better total throughput than four single-threaded jobs, the .4thr timings need to be roughly 3.5x or more faster than the .1thr ones. The fuzzy-factor here is due to memory contention effects, whereby multiple 1-thread runs will compete for the same system memory bandwidth and slow each other down. Starting four 1-thread production runs and letting them run through the first several 10000-iteration checkpoints will give you per-iteration timings you can more fairly compare to those for the same FFT length in the mlucas.cfg.4thr file.

Additional Notes:

If you want to do the self-tests of the various available radix sets for one particular FFT length, enter

Mlucas -s {FFT length in K} -iters [100 | 1000 | 10000]

For instance, to test all FFT-radix combo supported for FFT length 704K for 10000 iterations each, enter

Mlucas -s 704 -iters 10000

The above single-FFT-length self-test feature is particularly handy if the binary you are using throws errors for one or more particular FFT lengths, which interrupt the complete self-test before it has a chance to complete the configuration file. In that case, after notifying me (please!) the user must skip the offending FFT length and go on to the next-higher one, and in this fashion build a .cfg file one FFT length at a time. (Note that each such test appends to any existing mlucas.cfg file, so make sure to comment out or delete any older entries for a given FFT length after running any new timing tests, if you plan to do any actual "production" LL testing.

Overloading of Physical Cores:

On some platforms running 2 threads per physical core may offer some performance benefit. It is difficult to predict in advance when this will be the case: For example, on my Intel Haswell quad I get the best performance from running one thread per physical core, but on my dual-core Intel Broadwell NUC, using 4 threads, thus 2 threads per physical core, gives a 5-10% throughput boost over 1-thread per physical core. On AMD Ryzen, I not only see no gain from, but observe a pronounced deterioration in throughput from running more than 1 thread per physical core. On the just-released Google Cloud Skylake Xeon instances (which support the new AVX-512 instruction set), my code gets a huge (nearly 2-fold) throughput boost from using 2 threads per physical core.

To experiment with this yourself, you can again use a small set of self-tests, though I recommend using an FFT length for these which is reflect of current GIMPS assignments (As I write this, that means 4096 and 4608K FFT length for first-time tests and 2304K and 2560K for double-checks). It is also crucial to understand the CPU vendor's core numbering scheme here: On an Intel n-physical-core system, threads 0 and n map to physical core 0, threads 1 and n+1 map to physical core 1, and so forth through phydical core n-1. On AMD, threads 0 and 1 map to physical core 0, threads 2 and 3 map to physical core 1, etc. Thus if you want to gauge whether overloading will help for your GIMPS assignment, e.g. if you are testing at 4096K, try a targeted single-FFT-length set of self-tests at that length (again, after saving any existing mlucas.cfg file under a suitable name to keep it from being appended to by this test):

Intel n-core: ./Mlucas -fftlen 4096 -iters 1000 -cpu 0:n-1 (Insert your system's value of n, e.g. on a quad-core use -cpu 0:4)

AMD n-core: ./Mlucas -fftlen 4096 -iters 1000 -cpu 0:1

Then compare the resulting mlucas.cfg file entry's timing against that for the same FFT length in the mlucas.cfg.1thr file you should have saved previously.


Format of the mlucas.cfg file:

If you are running multiple copies of Mlucas, a copy of the mlucas.cfg file should be placed into each working directory (i.e.wherever you have a worktodo.ini file). Note that the program can run without this file, but with a proper configuration file (in particular one which was run under unloaded or constant-load conditions) it will run optimally at each runlength.

What is contained in the configuration file? Well, let's let one speak for itself. The following mlucas.cfg file was generated on a 2.8 GHz AMD Opteron running RedHat 64-bit linux. I've italicized and colorized the comments to set them off from the actual optimal-FFT-radix data:


	#
	# mlucas.cfg file
	# Insert comments as desired in lines beginning with a # or // symbol, as long as such commenting occurs below line 1, which is reserved.
	#
	# First non-comment line contains program version used to generate this mlucas.cfg file;
	14.1
	#
	# Remaining non-comment lines contain data about the optimal FFT parameters at each runlength on the host platform.
	# Each line below contains an FFT length in units of Kdoubles (i.e. the number of 8-byte floats used to store the
	# LL test residues for the exponent being tested), the best timing achieved at that FFT length on the host platform
	# and the range of per-iteration worst-case roundoff errors encountered (these should not exceed 0.35 or so), and the
	# optimal set of complex-FFT radices (whose product divided by 512 equals the FFT length in Kdoubles) yielding that timing.
	#
	2048  sec/iter =    0.134  ROE[min,max] = [0.281250000, 0.343750000]  radices =  32 32 32 32  0  0  0  0  0  0  [Any text offset from the list-ending 0 by whitespace is ignored]
	2304  sec/iter =    0.148  ROE[min,max] = [0.242187500, 0.281250000]  radices =  36  8 16 16 16  0  0  0  0  0
	2560  sec/iter =    0.166  ROE[min,max] = [0.281250000, 0.312500000]  radices =  40  8 16 16 16  0  0  0  0  0
	2816  sec/iter =    0.188  ROE[min,max] = [0.328125000, 0.343750000]  radices =  44  8 16 16 16  0  0  0  0  0
	3072  sec/iter =    0.222  ROE[min,max] = [0.250000000, 0.250000000]  radices =  24 16 16 16 16  0  0  0  0  0
	3584  sec/iter =    0.264  ROE[min,max] = [0.281250000, 0.281250000]  radices =  28 16 16 16 16  0  0  0  0  0
	4096  sec/iter =    0.300  ROE[min,max] = [0.250000000, 0.312500000]  radices =  16 16 16 16 32  0  0  0  0  0
Note that as of Jun 2014 the per-iteration timing data written to mlucas.cfg file have been changed from seconds to milliseconds, but that change in scaling is immaterial with respect to the notes below.

You are free to modify or append data to the right of the # signs in the .cfg file and to add or delete comment lines beginning with a # as desired. For instance, one useful thing is to add information about the specific build and platform at the top of the file. Any text to the right of the 0-terminated radices list for each FFT length is similarly ignored, whether it is preceded by a # or // or not. (But there must be a whitespace separator between the list-ending 0 and any following text).

One important thing to look for in a .cfg file generated on your local system is non-monotone timing entries in the sec/iter (seconds per iteration at the particular FFT length) data. for instance, consider the following snippet from an example mlucas.cfg file (to which I've added some boldface highlighting):

	1536  sec/iter =    0.225
	1664  sec/iter =    0.244
	1792  sec/iter =    0.253
	1920  sec/iter =    0.299
	2048  sec/iter =    0.284

We see that the per-iteration time for runlength 1920K is actually greater than that for the next-larger vector length that follows it. If you encounter such occurrences in the mlucas.cfg file generated by the self-test run on your system, don't worry about it -- when parsing the cfg file the program always "looks one FFT length beyond" the default one for the exponent in question. If the timing for the next-larger-available runlength is less than that for the default FFT length, the program will use the larger runlength. The only genuinely problematic case with this scheme is if both the default and next-larger FFT lengths are slower than an even larger runlength further down in the file, but this scenario is exceedingly rare. (If you do encounter it, please notify the author and in the meantime just let the run proceed).

Aside: This type of thing most often occurs for FFT lengths with non-power-of-2 leading radices (which are algorithmically less efficient than power-of-2 radices) just slightly less than a power-of-2 FFT length (e.g. 2048K = 221), and for FFT lengths involving a radix which is an odd prime greater than 7. It can also happen if for some reason the compiler does a relatively poorer job of optimization on a particular FFT radix, or if some FFT radix combinations happen to give better or worse memory-access and cache behavior on the system in question. Such nonmonotonicities have gotten more rare with each recent Mlucas release, and especially so at larger (say, > 1024K) FFT lengths, but they do still crop up now and again.


STEP 3 - RESERVE EXPONENTS FROM PRIMENET

Assuming your self-tests ran successfully, reserve a range of exponents from the GIMPS PrimeNet server.

By far the easiest way to do this and also submit results as they become available is to use the Python script named primenet.py for automated Primenet assignments management - this is to be found in the Mlucas src-directory. NOTE: The current version of the script only supports pre-v3 Python; if v3 is the default on your system, you will need to find which pre-v3 Python binaries are available (e.g. if 'which python' shows /usr/bin/python, then 'ls -l /usr/bin/python*' to see all versions.)

After you create however many run-subdirs you want to run jobs from (say, one per physical CPU core of your system) and copy the mlucas.cfg file resulting from the post-build self-tests into each, you also place a copy of src/primenet.py into each rundir (or prepend 'primenet.py' below with the - absolute or relative - path to the src-directory). Then just cd into each rundir in turn and - assuming you have a valid primenet user account with user ID 'uid' and and password 'pwd' (see below on how to create one, if not) - run

python primenet.py -d -T 100 -u [uid] -p [pwd]
Here, -d enables some useful debug diagnostics, nice to use on your first usage of the script. -T 100 means 'get smallest available first-time LL-test exponents', just grep the .py file for 'worktype' to see the other options. You must be connected to the internet when you launch the script; once it has done its initial work-fetching you can be offline most of the time; the program will simply periodically check whether there are any new results in the run directory in which it was launched; if yes *and* it is able to connect to the primenet server it will submit the new results (usually just one, unless you are offline nearly all the time) and fetch new work; otherwise it will sleep and retry later. The default is to check for 'results to submit/work to get?' every 6 hours; you may override this via the -t option, followed by your desired time interval in seconds. '-t 0' means run a single-shot get-work-to-do and quit, if for some reason you prefer to periodically run the the script manually yourself.

If the script runs successfully you should see a worktodo.ini file (if none existed already, the script creates it; otherwise it appends new work to the existing version of the file) with at least 2 LL-test assignments in it. The script will also periodically check the results.txt file in each run-subdirectory in which it is invoked. Whenever one or more new results are found and a connection to the internet is active during one of these periodic checks, the result is automatically submitted to the Primenet server, and the worktodo.ini file in the run directory 'topped up' to make sure it has at least 2 valid entries, the first of which will correspond to the currently ongoing job. Thus, the first time you use it, just need to run the py-script in each local run directory to grab work to be done, then invoke the Mlucas binary to start the Lucas-Lehmer testing.

Offline Testing:

Users who wish to eschew this can continue to use the Primenet manual testing webforms at mersenne.org as described further down on this page, but folks running multiple copies of the program will find the .py-script greatly simplifies things. See the

  • Get exponents from PrimeNet section for the simple instructions. , but not sure whether that will work from a cloud setup. Easy to try, though, a Here's the procedure (for less-experienced users, I suggest toggling between the PrimeNet links and my explanatory comments):

    Each PrimeNet work assigment output line is in the form

    {assignment type}={Unique assignment ID},{Mersenne exponent},{known to have no factors with base-2 logarithm less than},{p-1 factoring has/has-not been tried}

    A pair of typical assignments returned by the server follows:

    Assignment Explanation
    Test=DDD21F2A0B252E499A9F9020E02FE232,48295213,69,0 M48295213 has not been previously LL-tested (otherwise the assignment would begin with "DoubleCheck=" instead of "Test="). The long hexadecimal string is a unique assignment ID generated by the PrimeNet v5 server as an anti-poaching measure. The ",69" indicates that M48295213 has been trial-factored to depth 269, and had a default amount of p-1 factoring effort done with no factors found. The 0 following the 69 indicates that p-1 still needs to be done, but Mlucas currently does not support p-1 factoring, so perform a first-time LL test of M48295213.
    DoubleCheck=B83D23BF447184F586470457AD1E03AF,22831811,66,1
    M22831811 has already had a first-time LL test performed, been trial-factored to a depth of 266, and has had p-1 factoring attempted with no small factors found, so perform a second LL test of M22831811 in order to validate the result of the initial test. (Or refute it - in case of mismatching residues for the first-time test and the double-check a triple-check assignment would be generated by the server, whose format would however still read "Doublecheck")

    Copy the Test=... or DoubleCheck=... lines returned by the server into the worktodo.ini file, which must be in the same directory as the Mlucas executable (or contain a symbolic link to it) and the mlucas.cfg file. If this file does not yet exist, create it. If this file already has some existing entries, append any new ones below them.

    Note that Mlucas makes no distinction between first-time LL tests and double-checks - this distinction is only important to the Primenet server.

    Most exponents handed out by the PrimeNet server have already been trial-factored to the recommended depth (i.e. will be of the 'Test' or 'DoubleCheck' assignment type), so in most cases, no additional factoring effort is necessary. If you have exponents that require additional trial factoring, you'll want to either return those assignments or, if you have a fast GPU installed on your system, download the appropriate GPU client from the GPU72 project to do the trial factoring, as those platforms are now much more efficient for such work than using Prime95's TF option on a PC. Mlucas does have trial factoring capability, but that functionality requires significant added work to to make it suitable for general-public use. I plan to address that either in v15 or 16, depending on how that part of the code shapes up.

    If you wish to test some non-server-assigned prime exponent, you can simple enter the raw exponent on a line by itself in the worktodo.ini file.


    STEP 4 - LUCAS-LEHMER TESTING

    On a Unix, Linux or MacOS system, cd to each run subdirectory - typically one for each physical core of your CPU - containing a worktodo.ini fie, an mlucas.cfg file and a copy of the Mlucas executable (or a link to it, or just use a path name to the single binary) and type

    nice ./Mlucas [-cpu k] &

    The -cpu flag should be used if you are running more than one job - if your timing self-tests indicated that one thread per physical core is best in terms of total throughput (or you just want to get testing ASAP), then each job's k-valu (logical core index) should be set like so, where 'n-core' here refers to the number of *physical* (hardware) cores:

    Intel n-core: k runs from 0 through n-1, in increments of 1 between jobs;

    AMD n-core: k runs from 0 through 2n-2, in increments of 2 between jobs.

    Windows Users: Since the emulated Posix build setup does not support multithreaded builds, you will simply need to start as many single-threaded jobs as there are physical cores and instead of setting affinity via the -cpu flag, rely on the operating system to manage job/core affinity. Windows Task Wanager and timings (compared to your self-test ones) will give you a good idea as to whether the OS is up to the task.


    Advanced Usage: Manycore Systems and Multihreaded Runs:

    Note that the -cpu flag supports logical-core parametrization not only via standalone low:high:stride triplets, but also comma-separated triplets. This allows for a highly flexible affinity-setting schema. Let's say I find on my 32-physical-core system that running four Mlucas instances (labeled w0-w3, where w stands for 'worker'), each using eight index-adjacent physical cores and either 8 threads or 16 threads (in the second case we are thus overloading each physical core with 2 software threads) gives the best total system throughput. Then here are the resulting -cpu arguments for each of our 4 jobs (program instances), for the Intel and AMD logical-core-numbering schemes, in both 1-thread-per-physical-core and 2-thread-per-physical-core modes, in terms of -cpu assignments:

    1 thread per physical core: 2 threads per physical core:
    WorkerIntelAMD WorkerIntelAMD
    w0:-cpu 0:7 -cpu 0:14:2 w0:-cpu 0:7,32:39 -cpu 0:15
    w1:-cpu 8:15-cpu 16:30:2 w1:-cpu 8:15,40:47-cpu 16:31
    w2:-cpu 16:23-cpu 32:46:2 w2:-cpu 16:23,48:55-cpu 32:47
    w3:-cpu 24:31-cpu 48:62:2 w3:-cpu 24:31,56:63-cpu 48:63

    The program will run silently in background, leaving you free to do other things or to log out. Every 10000 iterations (or every 100000 if > 4 threads are used), the program writes a timing to the "p{exponent}.stat" file (which is automatically created for each exponent), and writes the current residue and all other data it needs to pick up at this point (in case of a crash or powerdown) to a pair of restart files, named "p{exponent}" and "q{exponent}." (The second is a backup, in the rare event the first is corrupt.) When the exponent finishes, the program writes the least significant 64 bits of the final residue (in hexadecimal form, just like Prime95) to the .stat and results.txt (master output) file. Any round-off or FFT convolution error warnings are written as they are detected both to the status and to the output file, thus preserving a record of them when the Lucas-Lehmer test of the current exponent is completed.

    Dec 2014: The program also saves a persistent p-savefile every 10M iterations, with extensions .10M, .20M, ..., reflecting which iteration the file contains restart data for. This allows for a partial-rerun - even in parallel 10Miter subinterval reruns, if desired - in case the final result proves suspect.

    ADDING NEW EXPONENTS TO THE WORKTODO.INI FILE: You may add or modify ALL BUT THE FIRST EXPONENT (i.e. the current one) in the worktodo.ini file while the program is running. When the current exponent finishes, the program opens the file, deletes the first entry and, if there is another exponent on what was line 2 (and now is line 1), starts work on that one.


    STEP 5 - SEND YOUR RESULTS TO PRIMENET

    For users who prefer not to use the automated Python assignments-management script, to report results (either after finishing a range, or as they come in), login to your PrimeNet account and then proceed to the Manual Test Results Check In. Paste the results you wish to report, that is, one or more lines of the results.txt file (any results which were added since your last checkin from that file) into the large window immediately below.

    If for some reason you need more time than the 180-day default to complete a particular assignment, go to the Manual Test Time Extension.page and enter the assignment there.


    TRACKING YOUR CONTRIBUTION

    You can track your overall progress (for both automated and manual testing work) at the PrimeNet server's producer page. Note that this does not include pre-v5-server manual test results. (That includes most of my GIMPS work, in case you were feeling personally slighted ;).


    ALGORITHMIC Q & A