Library with auto-tuned parallel memcpy?

More Carl Nettelblad's questions See All

Do you know a partner, friend, colleague, or someone else who was a Canadian gay (male) nurse who cared for PLWH during the HIV/AIDS pandemic?

My name is Carl GA Jacob. I am an Auxiliary Professor in the School of Nursing at the University of Ottawa. I am the author of the 2012 research titled: The Use of Experiential Learning in the...

10 June 2024 6,253 0 View

PAL enzyme assay absorbance so high?

Hi, im looking at PAL activity in blueberry samples. I use L-Phenylalanine as a substrate and absorbance at 290nm. The absorbance values of my blanks are higher than 1, even my milli pore water...

04 April 2024 3,590 3 View

How effective is community-driven development?

existing community-driven development projects in the Philippines

22 March 2024 8,346 0 View

Where in my dissertation should I explain how I dealt with missing data - Chapter 3 or 4?

08 February 2024 3,149 6 View

As Medical Technologists, how can we improve the quality of Microbiology Testing that we standardly perform today?

This question was asked in order to know if there are any possible suggestions that we can do to increase the efficiency and reliability of microbiology testing in our country.

04 February 2024 6,263 3 View

Is there a need of web security practices even when the students are fully aware?

I'm a student researcher that is studying everything about the student's awareness on web browser and online behavior and safety practices. I wanted to know if there's a need of handout/mini book...

13 January 2024 8,315 5 View

How should one teach ETHICS to the new generation of college students in 2024?

More and more, teaching ETHICS has become an important part of the college curriculum, but teaching it has not always been easy or up to part with the generational changes in the student...

02 January 2024 7,862 6 View

Are there any scales for body image satisfaction with interpretation, scoring, and questions that I can use?

I'm currently researching the levels of body image satisfaction of underweight and overweight adolescents. However, I'm having trouble finding scales with the full interpretation, specific scoring...

24 October 2023 8,520 3 View

What scales can I use to measure an adolescent's satisfaction with its body image and to measure an adolescent's social phobia levels?

I am currently researching the levels of body image satisfaction and levels of social phobia of underweight and overweight adolescents. I am having trouble finding the scales to measure these...

15 October 2023 656 3 View

What are the main causes of pre- and post-analytical errors in hematology laboratory testing?

There are many common mistakes that occur in pre - analytical and post analytical phases that cause erroneous result. I want to know if there's other new research or updated research regarding...

06 September 2023 4,859 3 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Separation of organic acids-HPLC?

Hello What should be done to separate and identify organic acids in HPC when their RetTime is the same?Like oxalic acid with Propanoic Acid.or acids that have a very close RetTime.

07 August 2024 8,782 3 View

Hello Everyone ! I'm looking for a good journal to publish my manuscript with low publication cost?

I am Looking for a Science Journal with good impact factor and low publication cost to publish a review paper. Your suggestions would be appreciated.

06 August 2024 6,796 3 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Any idea about 'International Research Journal of commerce , arts and science? Is it a UGC listed journal?

Any idea about 'International Research Journal of commerce , arts and science? Is it a UGC listed journal? Kindly advice

04 August 2024 7,367 3 View

David T Moore

I am a little out of my depth here, since I don't use Boost and have limited experience with parallel programming, but I don't understand why you need a parallel version of memcpy ... to my knowledge no such thing exists. Can you not use the typical trick of protecting the memcpy calls using semaphors (i.e. a mutex block)? That may slow your code down a bit I suppose, but at least you will get the benefit of using the appropriate architecture-tuned version of memcpy.

Phil Miller

I and a friend have designed and implemented exactly such a thing, with a queue of blocks to copy implemented using only atomic operations.

As you may be aware, 1 hardware thread is not actually sufficient to fully exploit memory bandwidth on a modern multicore CPU, and so memcpy between threads runs faster if they all participate.

I'll see if we can't release this code somewhere.

Christian Rahn

Some years ago I worked with the "Intel integrated performance primitives" library (commercial). Originally, it was targeted at image and signal processing. However, it should offer a parallelized version of memcpy as e.g. ippiCopy_XYZ. This family of functions should perform quite well, especially on Intel hardware. Maybe the calling arguments suit your actual application. It also offers some control on how many threads to use.

Carl Nettelblad

David, the point is like Phil said, a single hardware thread will not necessarily be able to saturate memory bandwidth. In many cases, it will, but not all. In an ideal NUMA setting, you should certainly be able to exhaust more bandwidth from the memory subsystem by engaging all memory controllers, from their local cores, rather than doing multi-hop accesses. I don't want to explore the details (within this project), but I figured that this could have been done by someone else.

I'll check out the current Intel offerings, like Christian suggested. I do know that we have institution-wide licenses on some of the Intel performance products, so maybe I can at least test if there is an improvement.

Matt P. Dziubinski

> [...] our current design has a master thread copying that data to newly allocated internal data structures [...]

On a side note, have you considered experimenting with multi-threading-friendly allocation strategies? For instance, TCMalloc (Thread-Caching Malloc) from Google Performance Tools or one of the allocator templates from Intel TBB:

http://goog-perftools.sourceforge.net/

http://goog-perftools.sourceforge.net/doc/tcmalloc.html

http://software.intel.com/sites/products/documentation/doclib/tbb_sa/help/tbb_userguide/Memory_Allocation.htm

Tim Prince

Intel compilers attempt to choose a suitable version of memcpy() automatically, but not normally with automatic threading.

For fans of the Intel IPP library (which doesn't support all Intel platforms) the obscurely named optimized headers option of Intel C++ engages IPP where it is considered useful.

It may be counter-productive to concentrate on certain platform-dependent details when you aren't willing to give attention to others. For example, current linux glibc memcpy, as well as Intel ones, has built-in algorithms to deal not only with variations on alignments but to choose between cached and non-cached implementation according to data block size.

You didn't even say whether your target platform has multiple memory controllers, which stand to produce some gain by engaging multiple threads when accessing memory.

Any parallelism hidden inside a function has to observe whatever threading models may be used elsewhere in the application, e.g. OpenMP or Cilk(tm) Plus, with the underlying implementation e.g. Microsoft threads or pthreads chosen according to the usual conventions of the operating system. So there is a lot to deal with if you are going to second guess your compiler and the vendors of your libraries.

Your remarks about a master thread imply that you have chosen a threading model. Why not use that model explicitly to distribute your memcpy work among a few threads?