ICOM 5995
Performance Instrumentation and
Visualization for High Performance Computer Systems
Department of Electrical & Computer Engineering
University of Puerto Rico – Mayaguez
Fall 2002
Lecture 3: September 18, 2002
Announcements:
· Please check that your name appears in the official list of students of ICOM 5995
· Attendance is a must. The attendance report to the university was already generated.
Topics:
Overview of parallel and distributed systems
System architectures
Parallel and Distributed Systems: Architectures
The computing power requirements for
scientific applications have led to the development of different approaches for
meeting the demands of processing speed, memory size and speed, and data
input/output rates. Increases in performance have come from several advances.
Some of these advances include:
Goal: To get as many operations in the processor per clock cycle.
How: several operations at the same time in the processor.
This leads to different approaches in architecture
Classes of processors:
· Superscalar
· Superpipeline
· Long Instruction Word
Superscalar Processor
Many units in the processor working at the same time. The compiler supports an instruction mix to generate independent instructions to keep the units busy.
Superpipelined Processors
Divide stages in the design of the units in the processor into less complicated stages working together. This increases the overall speed of the pipeline and therefore we can have higher clock speeds. Higher throughput.
Long Instruction Word
Each instruction is two or more regular instructions joined together into a single long instruction word. The instruction word is composed of simpler instructions which are executed simultaneously. Processor relies on the compiler to generate these instructions. The term VLIW stands for Very Long Instruction Word machines. There might be several floating point, integer, branch, and memory operations initiated each clock cycle. In the VLIW architecture, multiple functional units are exercised by the instructions on each cycle.
Other Advanced Features:
Branch Prediction: Avoid the penalty of filling the pipeline and flushing the pipeline when a conditional branch is encountered. Several schemes to predict which one of the alternatives in a branch decision is made. One approach is to see where has the processor gone in the past and predict future based on this.
Parallel Architectures
Scalable parallel processors (SPP) are also used to meet performance demands. SPPs are hundreds or thousands of state-of-the-art interconnected processors. The goal of SPPs is to obtain fast computers through highly parallel designs and substantial parallelism within the processor. Issues to consider when designing SPPs are whether to use Single-Instruction Multiple-Data (SIMD), Multiple-Instruction Multiple-Data (MIMD), and Very Long Instruction Word (VLIW). In the SIMD architecture, the same instruction is executed on multiple processors at the same time. In MIMD, different instructions and data are operated on by each processor. The real challenge in SPPs is how to perform at the potential the system has. Some of the challenges include decisions on the interconnection hardware, memory system design, compilers, and algorithm design.
In high-performance systems, efficient memory
access schemes are needed. Multiple levels caches are used to speed up data and
instruction access. Instruction reordering and data prefetching are used to
avoid latency caused by slow memories. This causes compilers to be left with
the task of generating efficient codes to take advantage of the hardware.
Memory access in shared memory systems has to incorporate cache coherence
mechanisms to avoid the access of non-valid data from cache.
Metacomputing is another approach to obtain
high-performance systems, also called GRID COMPUTING. The idea behind
metacomputing is to obtain a highly powerful distributed system composed of
physically distributed computers to obtain the best resources available to
jointly solve a problem. The system should be transparent to the users who will
concentrate on the solution of their respective problems and not on the
computational requirements of the problem. There are several issues in the
development of such a global computing platform, such as software
compatibility, high-performance networks, security, and user-friendly
interfaces. Metacomputing involves the interconnection of high-performance
networks, implementing a distributed file system, coordinating user access to
different computational structures, and making the environment easy-to-use and
transparent to the user.
Ethernet is used as local-area network
connection for networks of workstation (NOW) architectures. In a network of
workstations, each node of the system is a workstation that collaborates with
others via message-passing.
Plan
Kennie’s manual
General Instructions for ICOM 5995 Course
-----------------------------------------
* LAM Hosts
There are a total of 8 lam hosts (clients). This are: aramana, netlab02, albizu, netlab04, yuisa, netlab03, bayrex, betances. There is no particular order.
* Generate SSH keys
To be able to work with the system you will need to generate SSH authentication keys, so the remote node will let you execute something without an interactive login.
1. ssh-keygen -t rsa
> acepte los defaults
> no coloque un passphrase
2. cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
3. log to every node of the lam topology
After this you can know play with lam.