An auto-tuning framework for parallel multicore stencil computations

Abstract

Although stencil auto-tuning has shown tremendous potential in effectively utilizing architectural resources, it has hitherto been limited to single kernel instantiations; in addition, the large variety of stencil kernels used in practice makes this computation pattern difficult to assemble into a library. This work presents a stencil auto-tuning framework that significantly advances programmer productivity by automatically converting a straightforward sequential Fortran 95 stencil expression into tuned parallel implementations in Fortran, C, or CUDA, thus allowing performance portability across diverse computer architectures, including the AMD Barcelona, Intel Nehalem, Sun Victoria Falls, and the latest NVIDIA GPUs. Results show that our generalized methodology delivers significant performance gains of up to 22× speedup over the reference serial implementation. Overall we demonstrate that such domain-specific auto-tuners hold enormous promise for architectural efficiency, programmer productivity, performance portability, and algorithmic adaptability on existing and emerging multicore systems. © 2010 IEEE.

Other Versions

No versions found

Links

PhilArchive



    Upload a copy of this work     Papers currently archived: 96,272

External links

Setup an account with your affiliations in order to access resources via your University's proxy server

Through your library

  • Only published works are available at libraries.

Similar books and articles

Analytics

Added to PP
2017-04-27

Downloads
4 (#1,843,490)

6 months
2 (#1,854,942)

Historical graph of downloads
How can I increase my downloads?

Author Profiles

Clancy Chan
University of Massachusetts, Boston
Charlotte Chan
University of British Columbia

Citations of this work

No citations found.

Add more citations

References found in this work

No references found.

Add more references