How can I prepare C/C++ or java source code for better parallelization?

More Hassan Abedi's questions See All

Can the popularity of an open-source software project be predicted by analysing its source repository?

Hi, There are many platforms that provide hosting services for software developers(e.g. Atlassian's Bitbucket, Github) for free, many open-source projects are hosted on these platforms and the...

02 March 2018 6,937 7 View

Any idea how to identify a "fake user" on a social network?

Hi, Does anybody know any method(or research done) to detect a fake user in a website like Twitter or LinkedIn; by "fake user" I mean an account which its user is not a human being and can be a...

05 June 2016 5,972 11 View

Best strategy to partition a biconnected graph?

Does anyone know a fast method to decompose a biconnected graph, into probably overlapping clusters?, as you may know, a biconnected graph is a graph without any articulation nodes. any...

03 April 2016 7,115 4 View

Is there a survey/research paper containing a thorough list of edge or node centralities measures?

There are so many proposed centrality measures; like Between-ness or Degree or PageRank, etc; as you may know they're used mostly for mapping a real value number to a node/edge as a notion of...

04 May 2015 1,296 0 View

How to tell "where does the tail start" in a data pertaining to have heavy-tailed distributions?

consider you've got X that follows heavy-tailed distribution(for instance its values are following Pareto distribution), so how could one tell where does actually the heavy tail start from? btw,...

03 April 2015 6,412 6 View

Is there a way to shrink a graph while preserving some features from the original graph?

say you've got a huge graph with millions of edges and nodes and want to transform it into a smaller graph losing some edges and also some nodes, it could be still big but more manageable in size...

03 April 2015 2,465 12 View

Can someone help me with a community detection algorithm code?

Does anybody have the code for PageRank-Nibble algorithm preferably for Python or Java? i've got SNAPY and NetworKit(a really interesting tool) and Graph-tool(it's very fast) and NetworkX(it's...

02 March 2015 2,506 8 View

Are there any data mining and machine learning research papers with the data and code available for download?

do you know of any research papers{since 2012 and upwards preferably} in fields of Data Mining and Machine Learning in general that the researchers may have put both the data and the...

01 February 2015 8,539 4 View

What are the state of the art community detection algorithms for very large graphs?

I wanted to see what are the algorithms proposed to the day to handle {detect communities or clusters of densely interconnected nodes} large graphes with millions of nodes and edges, any resources...

31 December 2014 7,546 6 View

How to think or reason more critically and analytically?

i think sometimes i don't think or reason critically and analytically enough about issues that are of great importance to me, the problem is i think that most of the times i may not even be aware...

31 December 2014 1,552 12 View

Separation of organic acids-HPLC?

Hello What should be done to separate and identify organic acids in HPC when their RetTime is the same?Like oxalic acid with Propanoic Acid.or acids that have a very close RetTime.

07 August 2024 8,782 3 View

Which test should be used to study association among demographic profile and awarness level?

i have to study the awareness and adoption level of cloud computing in a district of India. i also want to use association among demographic variables like gender, age, education, income etc and...

02 August 2024 2,420 3 View

¿Are oxygen vacancies in semiconductors capable to change the band gap?

Dear all! Oxygen vacancies do raise negatively the Fermi level by increasing the negative charge in the semiconductor crystal structure. In the case of n-type semiconductors, I wonder if this...

30 July 2024 8,339 2 View

How to use Desmond in HPC ?

Our department has recently acquired an HPC (High-Performance Computing) system, and I'm thrilled to take my molecular dynamics calculations to the next level using Desmond. I used to run my...

28 July 2024 6,553 1 View

All math can be explained by iterator of code?

all math can be traversed by code? all math can be translate to code?

26 July 2024 9,530 0 View

Flow through curved domains?

Hi everyone, I am working on a curved domain in which a ship is situated in the middle (geometry is given below). In my understanding the general fluid flow is parallel to the x axis from inlet to...

25 July 2024 9,058 4 View

What are the strategies to Enhance IgG-Producing Plasma Cells in mice for Monoclonal Antibody Development?

I have immunized BalB/C mice with a protein using the intradermal (ID) method with Complete Freund's Adjuvant (CFA) and Incomplete Freund's Adjuvant (IFA), following a 14-day interval and three...

22 July 2024 9,160 2 View

What are the future implications of quantum computing on image processing algorithms?

Image Processing Algorithms, Quantum Computing.

17 July 2024 7,958 2 View

Given the current advances in Super Computation and Quantum Computing, what are the missing link between the Applied AI and Ultra Smart Cyberspace?

In recent years, quantum computing has emerged as a groundbreaking technology with the potential to revolutionize various fields, including artificial intelligence (AI). AI has already made...

17 July 2024 1,398 3 View

Could anyone please guide me on to proceed to apply for phd ?

Hi Everyone , I finished my Master's in Computer Science . I have work experince in IT services and learning cloud tech . I am not a good programmer (Can do if needed) . I like scripting...

10 July 2024 267 0 View

Jerrold (Jerry) Heyman Popular answer

There are a few tools that can be had, see http://en.wikipedia.org/wiki/Automatic_parallelization_tool. Additionally, some proprietary compilers have options to do auto parallelism - though anything machine generated will only be a start.

OpenMP directives can be manually added easily to loops such that each iteration is actually a separate thread running concurrently. The caveat on this is that updating a shared variable between the threads becomes problematic, and will automatically reserialize the code, even if it is running as separate threads.

There is lots of research in this area because the growth of multicore/manycore chips is forcing the rethinking of key algorithms currently in use. Just making them multi-threaded may not be enough to guarantee correct results - the algorithm itself may have to be rewritten.

Jerrold (Jerry) Heyman

Glen Dario Rodriguez

In case of nested loops, you can help automatic or manual parallelization by setting the corrrect order of loops if possible (outer loop should be the candidate for parallelization)

José Manuel Quintáns Pazos

Java makes this easy with threads (you just divide your code by convenience into threads) or with fork / join tasks (for concurrent processes).

Kamran Karimi

There are books written on this topic, and it all depends on your algorithm and how you are implementing it. In a nutshell, to get high performance, 1) don't compute what you don't have to, and 2) do things in parallel as much as you can to use all your CPU cores.

For a traditional compute-intensive application, focus on the main loops in your code. Make the iterations as independent of each other as possible, so each iteration can run independently of others. Automatic parallelization has come a long way, but I would suggest explicit parallelization using tools such as OpenMP for extracting better performance.

After parallelizing the code, you should tune it (change the number of loop iterations, as well as the amount of processing in each loop) for best performance on your target hardware.

Przemysław Karpiński

There is no easy way to do that. If you don't have time to re-design your code and you

1. Avoid vector dependencies to use SIMD instructions in loops. Use "#pragma ivdep" to inform compiler that there are no dependencies. Use vectorization reports (-ftree-vectorizer-verbose in gcc, -vec-report in icc) to confirm that a piece of code was vectorized

2. OpenMP is a good way for converting non-parallel code into vectorized one, but it doesn't give you threading control. TBB is better, but you still need some tool for vectorization (don't count on auto-vectorization only)

3. Data organization, Data organization and Data organization. Poor data organization - you will end up with one thread trying to access data that is in the memory of another thread. Most cases that I've seen with parallelization DECREASING performance were caused by threads starving for resources and cache misses.

Xavier Bonnaire

Some compilers or solutions like OpenMP can help with automatic code parallelization, using some pragmas in the case of OpenMP. However, all of these solutions give low performance parallel codes. This is mainly due to the unnecessary automatic insertion of mutexes that serialize the code. This is mainly because OpenMP or other parallel compilers do not understand the code context during the compilation.

So the best way to obtain a parallel code with very good performance is to write this code by hand and not to use automatic tools. This obviously require to think in parallel, which is not an easy task, and requires experience to achieve a good code. Most of the time, parallelizing a given code is also a bad idea, and re-designing the initial algorithm is a much better way.

Armando Alaminos Bouza

From the point of view of change to your sorce code, OpenMP is a very good choice. And OpenMP is also very efficient. But OpenMP is mainly for shared memory model or a CPU with an accelerator (GPU or Xeon Phi).

But, if you are planning migration to a distributed memory system, like supercomputers, clusters, etc, MPI is a better way. Unfortunately, MPI imposes significant change to source code.

Lauro Cássio Martins de Paula

If you intend to use GPU, I suggest CUDA. You may seek loops where you could explore parallelism. Anyway, you may contact me if you need help: [email protected]

Juan Fumero

You could try several things in C/C++. For example, some of them:

1. Try to look for special flags in the compiler in order to use vector code such as AVX or AVX2 (for instance -mavx in GCC or -xavx in ICC).

2. Use easy programming models like CilkPlus in order to create parallel and vector code by using very simple sentences and some tokens like cilk_spawn or cilk_for. This programming model is integrated with GCC in the last release.

3. Use OpenMP for multithread or OpenACC for parallelism in heterogeneous system like CPU-GPU.

In Java there are some things, for example:

1. There is a new API in JDK8 in java.util.stream where you can use some parallel pattern like map-reduce by using lambda expressions. Internally, if you call to parallel method ( IntStream.range(0, n).parallel.map( /* lambda */ ) ) , Java launches a set of threads and the lambda computation is processed in parallel.

2. There are a set of classes and utilities in java.util.concurrent. This package contains a set of classes which make easier multi-thread programming.

3. You can use a binding for CUDA or OpenCL in Java like jcuda or javacl.