Algorithm Development and High-Performance Software Implementation: Areas I've done serious analysis and coding in include:
Programming Languages & Skills: Fluency in C, C++, OOP, inline assembly (both Intel and AT&T/GCC syntax, x86, SSE SIMD), multithreading (mainly Pthreads and various thread-affinity extensions thereto), floating-point, multiword arithmetic, instruction set architectures (mainly the x86_64 ISA, also the now-legacy Alpha RISC).
Data Structures: Fluent in implementation of graphs, lists, trees, queues, heaps, etc, both in roll-your-own fashion and using the C++ STL, also with use of high-efficiency GCC extensions such as hash set and hash map; significant experience with matrix computations, discrete transforms & convolutions.
|B. S.||The University of Michigan, Ann Arbor||1985||Aerospace Engineering|
|M. S.||The University of Michigan, Ann Arbor||1987||Aerospace Engineering|
|M. A.||The University of Michigan, Ann Arbor||1991||Mathematics|
|Ph.D.||The University of Michigan, Ann Arbor||1993||Aerospace Engineering|
|Sep 2007 - Nov 2011||
Senior Staff R&D Engineer
|Wrote the schematic-versus-layout-netlist comparison code for the SDL (schematic-driven layout) toolsuite of the Custom Designer (CD) chip-design tool. SDL was a brand-new project group budgeted for 4-6 full-time developers when I joined it. Early stages of SDL involved deploying a project infrastructure centered around a pair of schematic and layout netlists, resulting from parsing and elaboration of design data stored in the OpenAccess (OA) database atop which CD is built. The verify engine (which I had major responsibility for) compared the 2 netlists and flagged mismatches of various kinds (user had control over which of the possible menu of detectable errors were to be flagged) via OA markers created in the layout. These markers were used by a subsequent eco-fix engine, which supported both batch-mode and incremental (user-selected) error repair. The early versions of verify were device-name-based, which made netlist compare an exercise (or better, nested series of exercises) in predicated-sorting and list-comparison, taking account of things including layout-device multiplicity (e.g. folded-device groups), multi-device accumulated parameters, inexact compare (to specified tolerance} of floating-point-style device parameters, and pin-permute rules of arbitrary complexity. In the past 2 years the major verify-side effort was centered on deployment of a topological (circuit-structural) netlist-compare capability, which essentially reduces to a graph isomorphism (or more importantly, non-isomorphism, since structurally mismatched graphs are the true challenge inputs) algorithm, again capable of handling the types of graphs which commonly arise in circuit-compare problems - many repeated or quasi-repeated elements, portions of the design with missing or severely broken connectivity, and so forth. Closest coworkers on SDL were Haichun Chen (project lead), Greg Woolhiser and Kuldeep Karlcut.|
|Jan 2006 - Aug 2007||
Senior Software Engineer
Agate Logic, Inc.
|Agate is the company which bought out Leopard Logic's IP in mid-2005. Ongoing SW tools development, mainly in the areas of router, architectural modeling and optimization, architectural development, quality of results of the p&r tool flow, and helping train and work with a group of young engineers in the Agate Logic Beijing office. Also developed a general method for hybrid planar/hierarchical integrated-circuit interconnect while there (Cf. U.S. Patent 7,786,757).|
|Nov 2001 - Oct 2004||
Staff Software Engineer
Leopard Logic, Inc.
|The Leopard Logic "router guy." More-or-less sole responsibility for development of the production router tools for both the original LL FPGA core and later for the hybrid Gladiator CLD core. For the hierarchical FP core, devised a novel "reverse routing" algorithm that is both extremely fast and provably optimal in terms of mux usage (Cf. U.S. Patent 7,725,863). This allowed us to reduce the area of the chip by roughly one-third and (despite the reduced mux counts) routinely route designs having over 90% cell utilization. For Gladiator, the challenge was to quickly develop and productize a placement engine and fast, efficient global and detail router for a mask-programmable heterogenous logic fabric consisting of 64K 4-lut-based corecells, 64 RAM/MAC pairs running through the MP base in 8 vertical columns, surrounded by a set of horizontal and vertical outer routing channels and 8 I/O banks, each with a different capacity. Despite extremely limited manpower due to very restricted funding, the Gladiator software-tool development effort was successful beyond any reasonable expectation.|
|Jul 2000 - Oct 2001||
Adaptive Silicon, Inc.
Los Gatos, CA
|Major responsibility for development of the detail-router component of the integrated Verdi toolsuite for the ASi embeddable FPGA core, as well as numerous high-performance graph-theoretic utility algorithms|
|Jul 1993 - Jun 1999||
Dept. of Mech.&Aero. Engineering
Case Western Reserve University
10900 Euclid Avenue
Cleveland, OH 44106-7222
|Taught undergraduate and graduate classes in Computational Fluid Dynamics, Numerical Methods in Engineering, Advanced Fluid Mechanics. Supervised the Senior Project capstone design class, as well as two PhD dissertations (Dr. Sih-Tsan Lee and Dr. Kamran Eftekhari) and several Masters' theses.|
7,725,863, Reverse Routing Methods for Integrated Circuits Having an Hierarchical Interconnect Architecture, Granted 25. May. 2010.
7,786,757:, Integrated Circuits with Hybrid Planar Hierarchical Architecture and Methods for Interconnecting Their Resources, Granted 31. August 2010.
R. E. Crandall, E. W. Mayer and J. S. Papadopoulos, The twenty-fourth Fermat number is composite, Mathematics of Computation 72 (243), pp.1555-1572, December 2002.
E. W. Mayer and E. Reshotko, ``Evidence for transient disturbance growth in a 1961 pipe-flow experiment,'' Physics of Fluids 9, pp. 242-244, January 1997. (Note that a slightly longer-form version which was published as a CWRU technical memorandum is available in PDF form here. As the original LaTeX source is long lost, thanks to my friends Jane and Bob for the high-quality PDF-from-original-hardcopy, which was the only form I still had the paper in.)
E. W. Mayer, ``Making Waves,'' correspondence, Nature 384, p. 105, 1996.
E. W. Mayer and K. G. Powell, ``Viscous and inviscid instabilities of a trailing vortex,'' Journal of Fluid Mechanics 245, pp. 91-114, December 1992.
E. W. Mayer and K. G. Powell, ``Similarity solutions for viscous vortex cores,'' Journal of Fluid Mechanics 238, pp. 487-508, May 1992.
Mlucas: an open-source program for testing the character of Fermat and Mersenne numbers.
E. W. Mayer, Efficient long division via Montgomery multiply.
Iterative Methods for Solving Nonlinear Algebraic Equations: A web tutorial on the Newton-Raphson method, its higher-order-convergent analogs, and applications in areas such as solution of nonlinear differential equations and number theory.
S.-T. Lee and E. W. Mayer, ``Long's vortex revisited''. Due to career changes on the part of both authors, this work based On Lee`s 1998 PhD dissertation languished for a decade. I finally cleaned it up and submitted it to the Journal of Fluid Mechanics in early 2010. The reviews were strongly bipolar, but in hopes the generalized asymptotics for the Long-style vortices or the hybrid asymptotic/numeric approach for automated matching of the far-field and inner-core solutions may prove interesting to other researchers in the field, the paper has been 'arXived' online at the above link.
January 2016: 49th Known Mersenne prime verified using a multithreaded AVX2-based build of my Mlucas code.
February 2013: 48th Known Mersenne prime verified using a multithreaded SSE2-based build of my Mlucas code.
November 2009: Mlucas 3.0 beta code released - main feature is optimized SSE2 vector-double-precision support for the x86_64 (Intel core2 and AMD64) chip families.
April 2009: 47th Known Mersenne prime verified using my Mlucas code.
September 2008: 45th and 46th Known Mersenne prime verified using a multithreaded in-development version of my Mlucas code. M45 was a new world-record prime = 243112609-1, a number having a whopping 12,978,189 digits! The parallel code achieves an average processor utilization (compared to single-threaded) of 70% running 16-threaded on 8 dual-cores of a Sparc VII server and 4 quad cores of a Sparc VIII for large-array double-precision FFTs of lengths 2048K and 4096K floating doubles, respectively.
2007: Received a grant via the OpenSolaris - Google Summer of Code Project for a summer student (Tom Harper, co-mentored by Robert Giltrap and Tom Duell of Oracle Corporation and myself) to deploy some highly parallelizable large-vector FFT code within the framework of my Mlucas open-source Mersenne-number-testing program. Using a coarse-grained parallel-FFT framework I had developed and done most of the code prototyping and debugging for in 2005, via a combination of thread-level debugging and further parallel-bottleneck removal we soon achieved some very impressive parallel performance: superlinear multithreaded performance on an Itanium/Linux system using up to 4 threads, 6x speedup for 8-threaded and 10x for 16-threaded on a 16-core Sparc VI system. Due to the difficulty of parallelizing an inherently data-nonlocal algorithm like FFT (especially for out-of-cache dataset sizes), this degree of parallelism is unprecedented in this context.
1999-2004: Using my Mlucas code, performed official independent-hardware-and-software verification runs of the newly-discovered Mersenne primes M6972593 (2098960 decimal digits), M13466917 (4053946 decimal digits), M20996011 (6320430 decimal digits) and M24036583 (7235733 decimal digits).
1999-2000: In collaboration with Richard Crandall and Jason Papadopoulos, established the composite character of the 24th Fermat Number. At the time, this was the largest number ever resolved as prime or composite via nonfactorial compositeness test.
1999: Lucas, a prototype version of my ever-evolving discrete-weighted-transform-based Lucas-Lehmer code for testing Mersenne numbers, makes it into the SPEC 2000 floating-point benchmark suite.
1993: Honorable mention 2nd place, McIvor Award (for outstanding research in Applied Mechanics), The University of Michigan College of Engineering.
1992: Outstanding Graduate Student Achievement Award, Department of Aerospace Engineering, The University of Michigan.
1990-1991: NASA Graduate Student Researcher Fellowship, Lewis Research Center.
1989: Research Partnership Award, Horace H. Rackham School of Graduate Studies, University of Michigan.
1987: Ralph W. Dobbins Fellow, Department of Aerospace Engineering, The University of Michigan.
Development of fast numerical algorithms for transform-based large-integer arithmetic, primality proving, and factorization. Development of advanced fast Fourier transform (FFT) algorithms optimized for modern cache-based microprocessor architectures, and (in ongoing collaboration with Peter L. Montgomery) hybrid floating-point/integer transform algorithms for factorization and primality testing of very large integers, targeted for leading-edge architectures with a good balance of floating-point and integer capabilities such as the Alpha 21264, Intel IA-64, AMD Opteron and Apple/Motorola PowerPC/AltiVec). In 1999 (in collaboration with Richard E. Crandall and Jason S. Papadopoulos) conducted computational proof of the composite character of the twenty-fourth Fermat number (5050446 decimal digits) using a highly efficient floating-point-based code of my own writing, running on a MIPS R10000 single-processor workstation. (JSP conducted a second, independent floating-point-based proof, and REC organized a distributed volunteer effort which used checkpoint files generated by the floating-point runs to effect a parallel, all-integer ``wavefront'' check of the Pépin squares using multiple machines, including 5 Pentia paid for by a grant from the Number Theory Foundation.)
Applied mathematics/numerical analysis:
Mathematical analysis of finite-difference, finite-volume and spectral methods to solve linear and nonlinear ordinary and partial differential equations; weakly and strongly nonlinear hydrodynamic stability theory; nonlinear wave propagation and singularity formation; sensitivity analysis of hydrodynamic stability operators. Development of high-accuracy numerical algorithms for the generalized linear eigenproblem
Ax=cBx. The latter algorithm is far more accurate than the generalized-eigensystem routines in LAPACK package, considered the state of the art in numerical linear algebra.
Theoretical and computational fluid dynamics:
Application of finite-difference, finite-volume and spectral methods to solve linear and nonlinear ordinary and partial differential equations arising in boundary-layer theory, swirling flows, compressible aerodynamics and hydrodynamic stability theory.
Classic Iterative Methods for Polynomials:
In 1994, after reading a brief but very interesting SIAM Review Classroom Note by J. Gerlach ("Accelerated Convergence in Newton's Method," SIAM Review 36, 272-276), discovered an elegant recurrence formula for generating Newton-style iterative schemes of arbitrarily high order: For f(x) sufficiently smooth, an Nth-order-converging iterative scheme is defined by
|xk+1 = xk -||f f'
[(f')2 - f f''/2]
|xk+1 = xk -||f [(f')2 - f f''/2]
[(f')3 - f f' f'' + f2 f'''/6]
Study of the structure and dynamics of concentrated vortices in aerodynamics, geophysical and bio-fluid dynamics, and of particular phenomena such as vortex instability and breakdown.
Theoretical and numerical hydrodynamic stability theory:
Study of linear and nonlinear stability of swirling and nonswirling shear flows, the mathematical behavior and numerical solution of equations arising in this context, and applicability to understanding of physical flow instabilities including (but not limited to) shear-layer instabilities, swirling-flow instabilities/vortex breakdown and transition to turbulence.