For any given parallel algorithm or implementation, which parallel machine (e.g. GPUs, Xeon or Xeon Phi) is considered optimal for performance?

More Masab Ahmad's questions See All

What percentage of program completion time is spent in synchronization on GPUs?

Assume a memory access bound workload such as graph analytics, machine learning, monte-carlo simulations etc. Assume a high-end single chip GPU of the current generation (2020).

01 February 2020 808 3 View

How do recent works against Spectre and other architectural attacks stack up in performance and usability?

Works such as Mi6, InvisiSpec, and IRONHIDE, and many other works, target secure isolation of target processes. How do these works rank and work in terms of performance, complexity, and usability?

04 May 2019 8,599 2 View

Has machine learning improved current weather modeling predictions?

Weather modeling is a prediction problem, and most models are not entirely accurate. Has machine learning improved current weather models? What has improved and how does it improve predictions?

03 April 2019 969 2 View

What type(s) of machine learning or AI algorithms are considered most dangerous for automation of jobs?

Automation of various jobs is underway. What types of machine learning algorithms are used to replace human work?

03 April 2019 5,718 5 View

What are the implications of computing on today's environment?

Computing takes significant power, resources, and man-power. What are the environmental implications of computing in this context?

03 April 2019 2,870 3 View

How does Bayesian inference compare against other machine learning models?

Bayesian inference is a machine learning model not as widely used as deep learning or regression models. Why is it not as widely used and how does it compare to highly used models?

03 April 2019 1,094 9 View

Which is the best parallel implementation for Dijkstra's algorithm in graph analytics?

Dijkstra's algorithms performs well sequentially. However, applications require even better parallel performance because of real-time constraints. Implementations such as SprayList and Relaxed...

02 March 2019 4,957 5 View

What is the purpose of performance predictors? Which predictors are the best available?

Performance prediction is required to optimally deploy workloads and inputs to a particular machine/accelerator in computing systems. Different predictors (e.g. AI predictors) come with different...

02 March 2019 10,049 3 View

What are your views on the latest advancements in quantum computing for machine learning algorithms?

This question stems from the latest papers in quantum machine learning. https://www.technologyreview.com/s/613119/quantum-computing-should-supercharge-this-machine-learning-technique/

02 March 2019 5,916 3 View

Are quantum computers going to be used as general purpose machines? or as accelerators connected to the CPUs of today?

Quantum computers are known to perform well on a limited problem space. With fast incoming developments in their technology, are quantum computers going to be used as general purpose machines in...

02 March 2019 8,469 3 View

Flow through curved domains?

Hi everyone, I am working on a curved domain in which a ship is situated in the middle (geometry is given below). In my understanding the general fluid flow is parallel to the x axis from inlet to...

25 July 2024 9,058 4 View

What are the strategies to Enhance IgG-Producing Plasma Cells in mice for Monoclonal Antibody Development?

I have immunized BalB/C mice with a protein using the intradermal (ID) method with Complete Freund's Adjuvant (CFA) and Incomplete Freund's Adjuvant (IFA), following a 14-day interval and three...

22 July 2024 9,160 2 View

How to use evolutionary algorithms with real parameters in ryu sdn controller with large scale?

Hi, I wanna to implement evolutionary algorithms in ryu sdn controller in mininet, i have some challenges, how i can run the big scale topo with one sdn contoller??? and another question is to...

21 July 2024 246 2 View

What should I upgrade in my computer to speed up simulation in CST?

I want to reduce simulation time in CST for frequency domain solver by upgrading my computer. I already have discrete GPU but it doesn't seem to make any difference at all compared to integreted...

10 July 2024 2,789 4 View

How to install Quantum Espresso with Intel compiler ?

Hello ! I would like to do band structure calculations of perovsikites materials with quantum espresso. For this I installed the Intel Fortran compiler. This is what I get when I run the...

08 July 2024 1,207 0 View

What is the current status of augmented learning in robotic surgery?

I would like to perform a literature review at this time on augmented learning and learning augmented algorithms to enhance performance-guided surgery

06 July 2024 246 1 View

Does anyone know any C++ implementation of Kolmogorov-Arnold network other than mine?

Mine is sitting here: http://openkan.org/DLpiecewiseCPP.html I wish to see someone else's.

06 July 2024 9,974 1 View

Can I do Parallel Analysis for Principal Axis Factoring Method?

Recently I was suggested to do Parallel Analysis and compare with EV>1 to determine no of factors for my scale. However, the scale I have developed uses Principal Axis Factoring and not PCA. If...

30 June 2024 1,813 2 View

Which is better for the student : Implementing the principles of object-oriented programming using Java or C++?

Object-Oriented Programming

29 June 2024 4,877 12 View

Incomplete information on performance values in MCDM methods?

Good evening, I am looking for a method or approaches in multi-criteria decision making (MCDM) that deal with incomplete decision matrices. This means that no values or intervals can be assigned...

23 June 2024 6,557 1 View

Eduardo César Popular answer

Dear Ahmad,

Besides cost, there are many factors you have to consider.

First, you are asking which hardware to use for a given algorithm or implementation, which I think is not the right question because a parallel algorithm is developed taking into consideration the hardware.

I'm not going to take the same approach if my solution is for a cluster using MPI, a multicore processor using OpenMP, or a manycore processor using CUDA.

So, before deciding the hardware and the algorithm you have to analyze the problem (you may be interested in looking for Foster's methodology). How can it be decomposed (many independent tasks, few coarse grain tasks, etc)? Is it regular (in its memory access pattern, in the operations done on data)? What's the size of the data (can it be fitted in a GPU memory or in the main memory)? Is it memory bound or computation intensive?

After this process, you can take the decision about the hardware and after that you can start developing the program.

Finally, when you have a functional program, you should start with a performance tuning process for maximizing the performance indexes of your interest (speedup, efficiency, power consumption, throughput, etc.).

Best!

Anderson Amorim

Dear Ahmad, I think it really depends on the kind of problem you are going to work.

GPU is a great way to improve your performance in overall, but if the threads have a lot of dependency, it will probably cause overhead, which will lead you to lose performance

GPUs are well suited to processes with many operations to perform. But keep your eyes on the overhead situation.

Regards.

Eduardo César

Adheem Naeem

All the best for your project dear