The conference will include 5 single-track keynote lectures.
University of Tennessee and Oak Ridge National Laboratory
Impact of Architecture and Technology for Extreme Scale on Software and Algorithm Design
In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination of compile--time and run--time techniques, but the increased scale of computation, depth of memory hierarchies, range of latencies, and increased run--time environment variability will make these problems much harder.
We will look at five areas of research that will have an importance impact in the development of software and algorithms.
We will focus on following themes:
• Redesign of software to fit multicore architectures
• Automatically tuned application software
• Exploiting mixed precision for performance
• The importance of fault tolerance
• Communication avoiding algorithms
National Center for Atmospheric Research (NCAR), Boulder, Colorado
Towards Petascale for Atmospheric Simulation
Scaling complex multi-scale multi-model coupled experiments to run efficiently on tens or hundreds of thousands of processors is becoming a first-order requirement as the world's fastest high-performance computing systems verge into petascale (10^15 floating point operations per second). And here, users only recently acquainted with scaling as a term of art are now confronted with the fact that there is more than one kind, strong and weak scaling. This talk will preview performance and computing challenges for earth system modelers from the context of our experiences developing, maintaining and supporting the community Weather Research and Forecast (WRF) model on HPC systems. Topics will include coupling to other models, scaling to many thousands of processors, and improving node speed for strong scaling.
Risto M. Nieminen
Aalto University School of Science and Technology, Helsinki, Finland.
Algorithmic challenges for electronic-structure calculations
Computational capabilities for first-principles calculations of electronic properties, both static and dynamic, have dramatically increased during the recent years. This calls for new algorithms, which would allow scalable computations on massively parallel platforms. This talk will discuss the current challenges, recent developments and some large-scale applications.
Technical University of Denmark.
Computational Limits to Nonlinear Inversion
For non-linear inverse problems, the mathematical structure of the mapping from model parameters to data is usually unknown or partly unknown. Absence of information about the mathematical structure of this function prevents us from presenting an analytical solution, so our solution depends on our ability to produce efficient search algorithms.
Such algorithms may be completely problem-independent, or they may be designed with the structure of the concrete problem in mind.
We show that purely problem-independent algorithms (meta-heuristics) are inefficient for large-scale, non-linear inverse problems, and that the ’no-free-lunch’ theorem holds. This theorem asserts that the performance of all blind-search algorithms, when averaged over all conceivable problems, are exactly the same. We discuss typical objections to the relevance of the no-free-lunch theorem.
Algorithms adapted to the mathematical structure of the problem perform more efficiently. We study an important class of algorithms, namely the ones that exploit the knowledge of the smoothness of the misﬁt function of the problem. In many cases where the misﬁt function is ’band-limited’ - that is, can be expressed as a linear combination of a limited set of basis functions - we show that there is an optimal sampling strategy for inversion algorithms. However, a large class of smooth inverse problem remain hard, in the sense that the solution time grows at least exponentially with the dimension of the parameter space.
Efficient and Reliable Algorithms for Challenging Matrix Computations Targeting Multicore Architectures and Massive Parallelism
Along with the evolution towards massively parallel HPC systems with multicore nodes, there is an immense demand of new and improved scalable, efficient, and reliable numerical algorithms and library software for fundamental and challenging matrix computations.Such algorithms and software are used as building blocks for solving current and future very large-scale computational problems in science and engineering.
In this presentation, we review some novel contributions by the Umeå research group, including (1) Parallel and cache-efficient in-place matrix storage format conversion; (2) Parallel QR and QZ multishift algorithms with advanced deflation strategies; (3) The SCASY library - parallel solvers for Sylveter-type matrix equations with applications in condition estimation.
Topic (1) concerns techniques and algorithms for efficient in-placeconversion between standard and blocked matrix storage formats.Such functionality enables numerical libraries to use various data layouts internally for matching blocked algorithms and data structures to a memory hierarchy.
Topics (2) and (3) concern the solution of large-scale dense eigenvalueproblems and matrix equations via two-sided transformation methods and provide new functionality for scalable HPC computations. Fundamental two-sided matrix computations include the parallel reduction to Hessenberg and Schur forms. Compared to one-sided factorizations (e.g., LU, Cholesky, QR), these two-sided matrix decompositions have much more complex data dependencies that pose challenging parallel computing problems. A key technique to cope with these problems is our parallel multi-window technique combined with delaying of matrix updates and the accumulation of transformations. As an example, our novel parallel QR algorithm is one to two orders of magnitude faster than the existing ScaLAPACK routine.
PARTICIPATION BY SPONSORS:
Sandro Cambruzzi and Mikkel Riis Johansen
Presentation: The new features of Windows HPC Server 2008 V3 and Microsoft's HPC strategy.
Additional information on Microsoft's HPC involvement will be provided by posters and in a booth.
In addition to the keynote lectures, the conference will be organized in several parallel sessions, with 4–6 presentations (20 min.) in each session. Each session will be dedicated to a particular topic from the topic list below, or to a minisymposium. Alternatively, participants can display research results with a poster. When submitting an abstract for a presentation or a poster, authors are required to select one of the minisymposia, or to allocate the submission to one of the listed topics.
The topics covered in the conference include software, hardware, algorithms, tools, environments, as well as applications of scientific and high performance computing.
• Cloud computing
• Grid computing
• HPC algorithms
• HPC in meteorology
• HPC programming tools
• HPC software engineering
• Image analysis
• Parallel numerical algorithms
• Parallel computing in physics
• Scientific computing tools
Tools and environments for accelerator based computational biomedicine
organized by Scott B Baden (University of California, San Diego)
Acceleration is a powerful mechanism for delivering high performance in a compact package. However, the programming techniques needed to master the technology are a challenge for end users who are more concerned about the application and its solution than they are about the implementation. This mini-symposium will discus various tools and environments for computational biomedicine, that hide implementation details from the user while retaining the high performance benefits of accelerator based computing. The symposium includes data intensive applications, cell modeling and application programmer interfaces.
High Performance Computing Interval Methods
organized by Bartlomiej Kubica (Warsaw University of Technology)
Interval methods are a class of algorithms that are precise and even allow to obtain a guaranteed result. They also provide a useful and appropriate tool to describe the uncertainty of parameters, discretization inaccuracy and numerical errors. Nevertheless, they are usually time consuming and memory demanding. Hence, all attempts to increase their efficiency are required and valuable: parallel implementations, use of new data structures, improved algorithms. The Minisymposium is going to provide a forum for interval researchers to share their experiences and present possible improvements to the algorithms and successful applications. topics of interest include (but are not limited to):
• parallelization of interval methods:
◦shared-memory, e. g. on multi-core architectures,
◦distributed-memory, e. g. on grids, clusters, supercomputers, etc.
• the use of novel data formats and data structures for interval computations
• global optimization/equations solving methods
• linear systems with interval parameters
• ordinary and partial differential equations
• practical applications of interval scientific computing algorithms
Scalable Tools for High Performance Computing
organized by Luiz DeRose (Cray Inc.) and Felix Wolf (German Research School for Simulation Sciences)
The scale of current and future high end systems, as well as the increasing system software and architecture complexity, brings a new set of challenges for application developers. In order to achieve high performance on peta-scale systems, application developers need tools that can address and hide the issues of scale and complexity of high end HPC systems.This mini-symposium will focus on scalable debugging and performance analysis tools that can handle the challenges associated with heterogeneous architectures, hundreds of thousands of computing elements, multiple levels of parallelism, multiple memory levels, and novel programming paradigms.
Memory and Multicore Issues in Scientific Computing - Theory and Practice
organized by Michael Bader (Universität Stuttgart) and Riko Jacob (Technische Universität München)
The widening gap between memory bandwidth and latency, especially of parallel systems, compared to CPU performance has forced the development of increasingly complex memory hierarchies both, for cache memory on CPUs and for shared-memory compute nodes in HPC platforms. As many algorithms in scientific computing are memory-bound, such limitations have to be taken into account explicitly in algorithm design. The proposed minisymposium will thus deal with new algorithmic approaches and techniques for efficient implementation, as well as with the theoretical foundations of such algorithms, for example:
• Hardware-aware algorithms for compute- and memory-intensive real-world
problems, and the analysis of such algorithms in theoretical models (I/O Model, PEM)
• Cache-oblivious (in general, memory-architecture-oblivious) algorithms in scientific
• Multi-/many-core-aware approaches for small-and large-scale parallel simulations
(including software development and code optimization strategies such systems).
• Algorithms and models for platforms with hierarchical communication layout.
• New programming models for parallelization on multi-core and hybrid platforms.
• Tools for performance and cache behavior analysis (including cache simulation).
Multicore algorithms and implementations for application problems
organized by Sverker Holmgren (Uppsala University)
Today, all modern computer systems used for computing in science and engineering comprise multicore processors. The systems range from laptops with a dual- och quadcore processors to huge-scale computers where the nodes have several multicore sockets. Also, GPU accelerators with manycore processors are attracting a lot of attention within the high-performance computing community.
A common feature of multicore systems is the bottleneck introduced by the data transfer crossing the socket/GPU card boundary. Often, there is also a potential for very fast communication between the cores within the socket. For performance, it is essential that the parallel algorithms used are adopted to these features. In this minisymposium, parallel algorithms and implementations for a number of important application problem kernels are considered, and it is described how they are modified to deal with the specific features of multicore and hierarchical memory systems.
One focus of this workshop is on the modeling and expressiveness capabilities of scientific workflows, concerning the large-scale, dynamic nature, and complexity of the applications. Innovative proposals and case studies are sought where the above capabilities can be illustrated. Another focus concerns the mappings from the abstract workflow specifications and their intermediate representations onto the heterogeneous computing platforms that we now have available, ranging from HPC multicore machines, to large clusters, grid and cloud platforms.
Workflows for Large-Scale and HPC Scientific Applications
organized by Jose C. Cunha (Universidade Nova de Lisboa)
Parallelizing nature inspired algorithms
organized by Tomas Philip Runarsson (University of Iceland)
A vast literature exists on nature-inspired approaches in solving an impressive set of problems. They include evolutionary and genetic algorithms, neural networks, artificial life, multi-agent systems, ant algorithms, artificial immune systems, cellular automata, memetic algorithms particle swarms, swarm Intelligence, and more. Most nature based techniques are inherently parallel. Thus, solutions based on such methods can be conveniently implemented on parallel architectures. The purpose of this mini symposium is to discuss suitable parallel or distributed computing environment and techniques for such nature inspired systems.
Simlations of Atomic Scale Systems
organized by Hannes Jonsson (University of Iceland)
'New algorithms and computational approaches to large scale simulations of materials, molecules, liquids, etc. on the atomic scale. Distributed and parallel computing as well as new and improved approximations to the (intractable) fundamental equations and their numerical solution.'
organized by Anne C. Elster (NTNU, Trondheim)
The success of the gaming industry is now pushing processor technology like we have never seen before. Since recent graphics processors (GPU’s) have been improving both their programmability as well as have been adding more and more floating point processing, it makes them very appealing as accelerators for general-purpose computing. This minisymposium gives an overview of some of these advancements by bringing together experts working on the development of techniques and tools that improve the programmability of GPU’s as well as the experts interested in utilizing the computational power of GPU’s for scientific applications. The first PARA MS in GPU Computing was organized at PARA 2008 by Enrique S. Quintana-Orti, José R. Herrero and Anne C. Elster. It was followed by EuroGPU 2009 (organized by Elster and Stéphane Requena) at ParCo 2009 and a GPU MS at PPAM 2009 (organized by Herrero, Quintana-Orti and Robert Strzodka).
Distributed Computing Infrastructure Interoperability
organized by Morris Riedel (Forschungszentrum Jülich)
The workshop will continue a series of workshops on the subject, namely International Grid Interoperability and Interoperation Workshop IGIIW 2007 at the IEEE e-science conference in Bangalore, and IGIIW 2008 at the IEEE e-scence conference in Indianapolis. In 2009 Riedel organized a special issue of the Journal of Grid Computing, "Grid Interoperability". This time the scope will be extended a little, to distributed computing infrastructures (i.e. HTC-based and HPC-driven infrastructure interoperability).
Linear Algebra Algorithms and Software for Multicore and Hybrid Architectures organized by Jack Dongarra (University of Tennessee)
in honor of Fred Gustavson on his 75th birthday.
This minisymposium is organized in honor of Fred Gustavson on the occasion of his 75th birthday that will take place on May 29. Fred has spent most of his long and distinguished career at the IBM T.J. Watson Research Center, New York, from which he retired last year and continues his scientific work as adjunct professor at Umeå University. Since the mid 1980s, Fred has focused his work on the development of efficient algorithms and high quality library software for dense linear algebra computations executing on state-of-the-art High Performance Computing systems. A major theme of his research has been to understand and give new insight and knowledge to the interaction between algorithms with data structures on the one hand and different architecture features on the other hand, such as vector processors, memory hierarchies and different types of parallelism. Not surprisingly, this coincides with the topic of the minisymposium where we will look at new algorithms and trends to exploit the use of High Performance and Parallel Computing.
Besides, an invited presentation by Fred Gustavson, presentations by Jack Dongarra, Emmanuel Agullo, Alfredo Buttari, Bo Kågström, Lars Karlsson, Carl Christian K. Mikkelsen, and Julien Langou are planned.
Fast PDE Solvers and A Posteriori Error Estimates
organized by Jan Valdman (University of Iceland) and Talal Rahman (University College Bergen)
There are two main topics of this minisymposium: 1) fast solvers for partial differential equations such as multigrid and domain decomposition solvers and multiscale methods, including paraller implementations (or even vectorized implementation in Matlab), 2) a posteriori error analysis leading to higher computational efficiency together with adaptive refinement techniques such as hp-methods.
Real-Time Access and Processing of Large Data Sets organized by Helmut Neukirchen (University of Iceland) and Michael Schmelling (Max-Planck Institute for Nuclear Physics, Heidelberg)
The Large Hadron Collider at CERN will soon be producing Petabytes of data which will need to be stored and analyzed. To do this more effectively than foreseen within the current computing models, it will be necessary to develop a novel scalable and distributed analysis infrastructure to allow real-time random access and interaction with these volumes of data.
This mini symposium will address some of the major issues faced by this problem including (but not limited to):
- Redundant, distributed and encrypted non-RAID data storage approaches that are able to directly address individual events.
- Networks of autonomous computing/storage/routing (CSR) units.
- Data collection protocols capable of dealing with variable-sized event records and generating cumulative statistics from all events.
- Dynamically-adaptive load-balancing protocols.
- Data analysis middlewares with standard operating system-like interfaces.
- Peer-to-peer broadcast mechanisms to distribute both the analysis middleware and all query jobs to nodes.
- data distribution protocols.
- Real-time visualisations of both the active system.
and the results of analysis queries.
This mini symposium will involve both presentations and BoF (Birds of a Feather) or panel discussions of these topics. It is open for submission of abstracts for presentations and participation in the discussions. Contact the organizers for further details: