What is an example program that realizes a performance gain by calling _mm_stream_si64x()?

Go To StackoverFlow.com


What is an example program that realizes a performance gain by calling _mm_stream_si64x()?

The MSDN article on _mm_stream_si64x: http://msdn.microsoft.com/en-us/library/35b8kssy.aspx

2012-04-03 21:02
by Neil Justice
what do you mean? do you want us to write it - chikuba 2012-04-03 21:14
I mean exactly as the question is phrased. A rephrasing of the question: What is an example program that is needlessly slow due to not calling mmstream_si64x() - Neil Justice 2012-04-03 21:18
I'm intrigued by the multiple downvotes I've received on this question. Would anyone care to explain? I would have never guessed that a question about how best to call an intrinsic function would elicit downvotes - Neil Justice 2012-04-03 22:20
i think its mainly cus the question i vague and what do u actaully mean by a program? possible ask something along the lines with "in what situations would it inpact performance" and see if ppl maybe have some snippets of code that would give examples. good questions are usually questions with a solution and where you can tell that the person has done most of the work before asking it : - chikuba 2012-04-03 23:18


Here's an example, assuming the source and destination are sufficiently large:

const char *source;
char *destination;
for (size_t offset= 0; offset<100*1024*1024; offset+= 64)
    *(__int64 *)(destination + offset)= *(__int64 *)(source + offset);

If you do this manually instead of using _mm_stream_si64x, you effectively flush the cache.

2012-04-03 22:00
by MSN


Like the reference says, the _mm_stream_si64x intrinsic writes to the memory location pointed to by Dest directly without writing Dest to the cache. So if you want to copy data to the Dest pointer, but do not plan on accessing data from the Dest pointer until much later, then this intrinsic would 'realize a performance gain' over the equivalent _mm_stream_si64 intrinsic.

2012-04-03 21:46
by ds1848