I wrote a very simple single threaded java application that simply iterates (a few times) over a list of Integer:s and calculates the sum. When I run this on my Linux machine (Intel X5677 3.46GHz quad-core), it takes the program about 5 seconds to finish. Same time if I restrict the jvm to two specific cores using taskset (which was quite expected, as the application is single threaded and the cpu load is < 0.1% on all cores). However – when I restrict the jvm to a single core, the program suddenly executes extreeemly slow and it takes 350+ seconds for it to finish. I could understand if it was only marginally slower when restricted to a single core as the jvm is running a few other threads in addition to the main thread, but I can’t understand this extreme difference. I ran the same program on an old laptop with a single core, and it executes in about 15 seconds. Does anyone understand what is going on here, or has anyone successfully restricted a jvm to a single core on multicore system without experiencing something like this? Btw, I tried this with both hotspot 1.6.0_26-b03 and 1.7.0-b147 – same problem.
Yes, this seems counter-intuitive, but the simple solution would be to not do it. Let the JVM use 2 cores.
FWIW, my theory is that the JVM is looking at the number of cores that the operating system is reporting, assuming that it will be able to use all of them, and tuning itself based on that assumption. But the fact that you've pinned the JVM to a single core is making that tuning pessimal.
One possibility is that the JVM has turned on spin-locking. That is a strategy where a thread that can't immediately acquire a lock will "spin" (repeatedly testing the lock) for a period, rather than immediately rescheduling. This can work well if you've got multiple cores and the locks are held for a short time, but if there is only one core available then spinlocking is an anti-optimization.
(If this is the real cause of the problem, I believe there is a JVM option you can set to turn off spinlocks.)
This would be normal behaviour if you have two or more threads with an interdependence on each other. Imagine a program where two threads ping-ponging messages or data between them. When they are both running this can take 10 - 100 ns per ping-pong. When they have to context switch to run they can take 10 - 100 micro-seconds each. A 1000x increase I wouldn't find surprising.
If you want to limit the program to one core, you may have to re-write portions of it so its designed to run on one core efficiently.