Author:
Source
On behalf of Acquia I’m currently working on Drupal’s next big leap: Automatic Updates & Project Browser — both are “strategic initiatives”.
The foundation for both will be the (API-only, no UI!) package_manager
module, which builds on top of the php-tuf/composer-stager
library. We’re currently working hard to get that module committed to Drupal core before 10.1.0-alpha1
.
Over the last few weeks, we managed to solve almost all of the remaining alpha blockers (which block the core issue that will add package_manager
to Drupal core, as an alpha
-experimental module. One of those was a random test failure on DrupalCI, whose failure frequency was increasing over time!
A rare random failure may be acceptable, but at this point, ~90% of test runs were failing on one or more of the dozens of Kernel
tests … but always a different combination. Repeated investigations over the course of a month had not led us to the root cause. But now that the failure rate had reached new heights, we had to solve this. It brought the team’s productivity to a halt — imagine what damage this would have done to Drupal core’s progress!
A combination of prior research combined with the fact that suddenly the failure rate had gone up meant that there really could only be one explanation: this had to be a bug/race condition in Composer itself, because we were now invoking many more composer
commands during test execution.
Once we changed focus to composer
itself, the root cause became obvious: Composer tries to ensure the temporary directory is writable and avoids conflicts by using microtime()
. That function confusingly can return the time at microsecond resolution, but defaults to mere milliseconds — see for yourself.
With sufficiently high concurrency (up to 32 concurrent invocations on DrupalCI!), two composer
commands could be executed on the exact same millisecond:
We could switch to microtime(TRUE)
for microseconds (reduce collision probability 1000-fold) or hrtime()
(reduce collision probability by a factor of a million). But more effective would be to avoid collisions altogether. And that’s possible: composer
always runs in its own process.
Simply changing
sys_get_temp_dir() . '/temp-' . md5(microtime());
to
sys_get_temp_dir() . '/temp-' . getmypid() . '-' . md5(microtime());
is sufficient to safeguard against collisions when using Composer in high concurrency contexts.
So that single line change is what I proposed in a Composer PR a few days ago. Earlier today it was merged into the 2.5 branch — meaning it should ship in the next version!
Eventually we’ll be able to remove our work-around. But for now, this was one of the most interesting challenges along the way 🙂