IndexIntroductionComputingCacheClusterGPGPUDefinitionIntroductionParallelism' or 'parallel computing' is a term used to describe the practice of executing or creating processes that contain operations that can be executed simultaneously. Although the practice of parallelism has become increasingly popular in recent years, the concept originates in 1842 in LF Menabrea's “Sketch of the Analytical Engine Invented by Charles Babbage[1].” Menabrea describes a process by which the operation and duplication of a set of entered numbers can be conducted to occur simultaneously. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an original essay This process prevents the user from having to enter the same set of numbers more than once to use it in many operations and reduces both the possibility of human error and total execution time per input. Although this was a necessary optimization at the time, the advent of digital computing temporarily offset this need because the speed with which data could be entered and operations could be performed had increased enormously. Although original digital electronic computers such as the ENIAC used a form of parallelism [2], later computers most often scheduled operations using a more serial approach, with the exception of input or output. [3]Although the first commercially available parallel computer was launched in 1952 [4], the need for widespread parallel computing was not universally recognized until much later, when it was realized that solitary processing units would soon catch up their maximum speed in terms of speed. of clock frequency and floating point operations per second (FLOPS)[3]. In recognition of this it was determined that the most efficient way to increase computing speed was now to add additional processing units, an initiative now known as multiprocessing. The advent of multiprocessing has had a major impact on the design of both hardware and software. Since the speed of a CPU far exceeded that of any other component, CPU-specific memory had to be increased to reduce the slowdown caused by storage read times [5]. To allow the cores to communicate without unnecessary latency, "bridges" had to be created between the cores that operated at a speed comparable to that of the cores themselves. To facilitate core collaboration on individual tasks, the availability of fast memory accessible by multiple cores has become more important. This has created a need for software that can understand the asynchronous nature of accessing these memory banks, collectively known as "caches", and be able to efficiently split the task list so they can be assigned to multiple cores. Cache "Cache" is a term commonly used to refer to high-speed access memory reserved exclusively for use by the CPU to speed up its operations. A cache can be used as a kind of buffer, in which sizable blocks of relevant data are stored in the hope that it will be useful (a "cache hit"), or to hold values generated by the CPU during the execution of an operation. An example of the former would be reading the next N values of a list when the first item is requested since it is likely that the rest will be needed later. An example of the latter would be to contain a loop counter during a mean average operation. Caches are organized into "levels" ofspeed, where the highest (level 1) is physically connected to the CPU, often to a single core. In modern CPUs, level 2 caches are normally connected to the level 1 cache of each core[6], while the level 3 cache is separate and shared by all cores. The cache architecture is designed this way to allow a tiered approach to reading from it: if data is requested from a core, the highest level cache is read. If data is not found, lower-level caches are read in succession until finally the main cache is consulted.BridgesA 'bridge' is a term commonly used to describe the connection between the CPU, associated RAM, and the motherboard. In many architectures there are two bridges, called "northbridge" and "southbridge" [7]. The northbridge operates at a clock speed that is only slightly slower than the CPU cores themselves, and is used to allow fast communication between the cores and faster caches. The Southbridge operates much slower than the Northbridge and is used to transmit data to and from the motherboard. For this reason it is often considered the "I/O relay" of the CPU. It is worth noting however that this architecture was recently modified by Intel to include the northbridge within the CPU die[8], now known as the 'sandy bridge'. This was done in order to reduce the need for specific CPU components on a motherboard[9]. Parallel programming paradigms. Threading Definition"Threading" is a term used to refer to the practice of separating a program into multiple distinct control flows or "threads"', which are largely independent of each other [10]. These threads can then run concurrently and thus can significantly increase the overall execution speed of a process. Threads have access to a global memory bank and therefore can share data with each other [11], although care must be taken to ensure that this shared memory is not negatively affected by asynchronous access. Most modern operating systems make extensive use of in-order threading to simplify the user experience[12]. A simple process like Microsoft Notepad can contain only one thread, while a more complex process like Google Chrome can contain many threads performing different functions. A thread managed by the operating system is known as a "kernel thread" and is typically spawned at startup. Threads handled by user-controlled programs are known as "user threads" and are mapped to a free kernel thread when they execute. The process of creating and optimizing threads so that they can run in tandem is often referred to as "multithreading". Separate but related to this is "interleaved multithreading", where multiple virtual processors are simulated on one core and are scheduled to minimize the impact of latency caused by memory reads. This differs from standard multithreading as the emphasis in this scenario is now on creating a block of read/write operations across all interleaved threads, rather than on asynchronous processing. This approach can be further divided into "fine-grained" multithreading (where threads are swapped with each other in a round-robin fashion), "coarse-grained" multithreading (where threads are swapped if a particularly slow read occurs), multithreading "time-slice" (where threads are switched between after a set time has passed) and "switch-on-event" multithreading (where threads are switched if the current thread has to wait for input). Enables simultaneous completion of tasks without the use of specialized hardware. It provides an approachconceptually unchallenged by parallelism, thus allowing the programmer to create more powerful solutions. All threads within a process are affected by the state of global variables and settings within this process. If a thread performs an illegal operation and terminates, the process to which the thread belongs will also terminate. Cluster Computing Definition "Cluster computing" is a term used to refer to the practice of linking multiple computers together to form a larger "supercomputer". In this scenario, each device on the network can be thought of as analogous to a "core" in a single computer. When designing a computer cluster, the physical layout and description of the machines that comprise it must be carefully considered with respect to the tasks that the completed system will perform. it is expected to perform. Responsibilities that require a series of disparate, unconnected events (such as running a web server) may not require homogeneity of component devices, whereas features with a high level of inter-process communication (such as complex modeling procedures) they may require a greater level of coupling and therefore machine components with similar specifications[17]. Computer clusters can be built to perform a variety of tasks, but the emphases with which they are built fall into two main categories; load balancing and high availability. A high availability or “failover” cluster is built to ensure that the service provided is uninterrupted regardless of the circumstances. This is achieved by creating simple virtual machines to serve requests rather than serving them all from the main operating system. If one of these machines fails, you can quickly create a duplicate and resume the task you set. A load balancing cluster attempts to ensure that all component machines within the cluster have an equal share of the workload in order to maximize execution efficiency. Parallelism in these systems is commonly achieved using the Message Passing Interface, or MPI. MPI is based on the principle of using data packets sent between processes to synchronize them and allow them to communicate [15]. This enables efficiency at both local and global scales in both homogeneous and heterogeneous clusters, as local planning can be delegated to component machines while allowing supervision via an overall management protocol. Advantages One of the advantages of MPI is its portability. Since it is based on a simple concept, it can be efficiently implemented on a wide range of hardware. MPI(2) contains support for remote memory operations and similar support for UNIX-like file operations, thus allowing its implementation for different operating systems [18]. Furthermore, MPI allows for easy manipulation of data regardless of location, and is capable of compensating for different hardware speeds on various networked computers. Additionally, MPI is relatively efficient as it allows programmers to treat machines as individual units rather than as sections of the entire machine, which can be optimized for that unit. This division allows the specific peculiarities of the machine to be addressed. DisadvantagesMPI has limited support for shared memory operations, so using MPI to implement a large-scale application with shared memory may require more complexity than other approaches.Definition of GPGPUGeneral programming on graphics processor units (GPGPU ) is the practice of running programs using a computer's GPU instead of its CPU. Since graphics processors are specially built to facilitate the simultaneous processing of a.
tags