Henri Bal: Large-scale parallel computing on grids

Computational grids are interesting platforms for solving large-scale computational problems, because they consist of many (geographically distributed) resources. Thus far, grids have mainly been used for high-throughput computing on independent (or trivially parallel) jobs. However, advances in grid software (programming environments, schedulers) and optical networking technology make it more and more feasible to use grids for solving challenging large-scale problems.
The talk will first give a brief introduction to grid infrastructures, using the Dutch DAS-3 Computer Science grid as example. DAS-3 has a flexible and reconfigurable 40 Gb/s optical network called StarPlane between its five clusters and a 10 Gb/s dedicated optical link to the French Grid'5000 system. From a parallel programming point of view, grids like DAS-3 are characterized by a high-latency/high-bandwidth network and a hierarchical structure.

Next, the talk will discuss how algorithms and applications can be optimized to run in such an environment. It focusses on search applications like retrograde analysis, which, much like model checkers, analyze huge search spaces. As a case study, we have implemented an application that solves the game of Awari, which has 900 billion different states. Several optimizations were needed to obtain high performance on DAS-3/StarPlane.

The last part of the talk will discuss research on programming environments that will make it easier to develop parallel applications for grids. Grid programmers often have to use low-level programming interfaces that change frequently, and they have to deal with heterogeneity, connectivity problems, security issues, and dynamically changing execution environments. The Ibis project aims to drastically simplify the whole programming and deployment process of high-performance grid applications. The philosophy of Ibis is that grid applications should be developed on a local workstation and simply be launched from there. Ibis uses middleware-independent Application Programming Interfaces with different abstraction levels, ranging from low-level message passing to high-level divide-and-conquer parallelism and group communication.