How it Works
A High-Performance Computing system works with parallel algorithms, which are the alternatives to other algorithms executing complex calculations in a serial manner. A parallel algorithm runs multiple computing tasks at the same time on different processing units and then unites the partial outcomes to output the overall result. This can be achieved in a faster manner than the corresponding serial algorithm. In other words, breaking down a complex serial algorithm to many computing tasks that can run in parallel, is the logic behind the design and implementation of a parallel algorithm.
The computing units can be CPUs consisting of many cores or Graphical Processing Units – GPUs or a combination of both. Many servers containing CPUs and GPUs interconnected with a fast network, usually InfiniBand or Fast Ethernet, orchestrated by a workload scheduler, constitute a High Performance Computing – HPC cluster, which can be utilized by a parallel algorithm. These clusters exist in data centers on companies’ premises.
A modern solution is to use Cloud computing resources to run parallel algorithms. A more advanced solution is to use a combination of both cloud and on-premises computing resources.
Artificial Intelligence – AI applications can run on HPC systems to produce fast results. Since large amounts of data must be processed by AI applications to produce accurate and valid results in a short time, the use of HPC systems is of high importance for them. AI cannot exist without HPC.
“There are only two kinds of certain knowledge: Awareness of our own existence and the truths of Mathematics.”
– Jean le Rond d’Alembert
Solutions & Results
However, some important factors must be considered. Most modern AI applications are usually developed using Python due to the many libraries and facilities that accompany this programming language. On the one hand, Python code does not need to be built and linked like other classic programming languages. It is an interpreted language. Therefore, its execution is very slow compared to C, C++ and other languages. Besides, many libraries used by Python are built using major compiled languages. In other words, Python acts as an interface to these libraries.
On the other hand, HPC parallel programming tools are based on compiled languages. Essential HPC tools like Open Multi Processing – openMP and Message Passing Interface – MPI are implemented in C, C++. These tools are very important because allow information exchange amongst the different processors (cores) of an HPC system.
It is obvious that some interventions are required to address the compatibility of programming languages and other issues.
The use of containers can be a solution, since they allow the configuration of a computing environment independently from the underlying infrastructure. Thus, Python applications can become more scalable.
In more complex cases, like AI traditional classification algorithms, which follow a tree structure during their development, a carefully and accurately selected partition strategy of the ongoing decision tree produced nodes, and the assignment of the derived parts to new processing cores of an HPC cluster, during the parallel algorithm execution is necessary.
The reason is that there is a continuously increasing communication cost amongst the computing processors, which process the tree nodes, while the decision tree is developed from tree level to tree level. This method helps to distribute the communication cost and to balance the tree construction processing workload. For a tree partition to take place, other factors must also be taken into consideration, like the data moving cost amongst the processors, thus making this process more complex, requiring great attention, care and expertise.