Optimizing and tuning the fast multipole method for state-of-the-art multicore architectures

Abstract

This work presents the first extensive study of single- node performance optimization, tuning, and analysis of the fast multipole method on modern multi- core systems. We consider single- and double-precision with numerous performance enhancements, including low-level tuning, numerical approximation, data structure transformations, Open MP parallelization, and algorithmic tuning. Among our numerous findings, we show that optimization and parallelization can improve double- precision performance by 25× on Intel's quad-core Nehalem, 9.4× on AMD's quad-core Barcelona, and 37.6× on Sun's Victoria Falls. We also compare our single-precision version against our prior state-of-the-art GPU-based code and show, surprisingly, that the most advanced multicore architecture reaches parity in both performance and power efficiency with NVIDIA's most advanced GPU architecture. © 2010 IEEE.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 96,326

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Analytics

Added to PP
2017-04-27

Downloads
5 (#1,771,992)

6 months
5 (#1,303,461)

Historical graph of downloads
How can I increase my downloads?

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references