This session provides an overview of the vision and mission of SHARCNET, the available high performance computing resources and the basic usage information concerning job submissions/management, queuing policies and available software, as well as the research programming support available to the SHARCNET research community.
The audience is expected to be familiar with the philosophy of performing computational tasks under UNIX and have a good grasp of the basics in computer science.
This session introduces some basic concepts and practice in high performance computing. The audience will be exposed to a number of key concepts and techniques in high performance computing, including the change of memory access patterns, optimization with compiler switches, use of libraries and parallelization, etc., through a number of simple, easy to understand examples.
The audience is expected to have some practical experience in scientific computing and be familiar with C/C++ or Fortran programming language. Some experience with parallel programming will definitely be an asset but not required.
There are a variety of motivations driving the need for parallel software solutions. It is common to think of parallelism as a means of obtaining results faster; however, equally valid is the desire to model larger problems than are currently feasible, or equivalently to replicate existing experimental situations with much higher resolution.
Issues such as simplicity, modularity and maintainability still warrant our attention, however it also becomes common to dispense with some of the traditional notions of data abstraction in order to facilitate efficiency in fundamentals such as communication patterns. This talk will provide an overview of some of the fundamental issues in the parallel design process as well as critical issues that must be addressed appropriately when considering a parallel solution. Concepts will be explored at more depth using a case study on parallelizing a genetic algorithm.
The speed of a CPU is much faster than the speed of the main memory. CPU caches are small, fast memories usually found on the same die as the CPU cores. These are used to bridge the speed gap between the CPU core and the main memory. When the CPU requests an address, the CPU cache is checked. If found then a cache hit occurs otherwise a cache miss occurs. A cache miss requires multiple CPU cycles to access the requested address from the main memory and thus decreases application performance. When the CPU cache is full, then an existing cache entry is chosen to be replaced. A cache replacement algorithm decides the cache entry to be replaced. The most commonly used CPU cache replacement algorithm is the Least Recently Used (LRU) replacement algorithm which evicts the cache entry that is least recently accessed. This is based on the assumption that programs exhibit the property of temporal locality, which is phrased as recently accessed items are likely to be accessed in the near future. Under LRU, if an address is just referenced it stays in the cache for the longest possible time. We will show results of our analysis of memory access patterns that suggests that not all programs (especially networked applications) exhibit the expected locality that LRU assumes. We will then show how a change in the cache replacement algorithm can improve hit rates.