If a step is cancelled just as its builder step is starting, doBuildStep() will return sRetry. This causes builder() to make the step runnable again, since the queue monitor may have added new builds referencing it. The idea is that if the latter condition is not true, the step's reference count will drop to zero and it will be deleted. However, if the dispatcher thread sees and locks the step before the reference count can drop to zero in the builder thread, the dispatcher thread will start a new builder thread for the step. Thus the step can be kept alive for an indefinite amount of time.
The fix is for State::builder() to use a weak pointer to the step, to ensure that the step's reference count can drop to zero before it's added to the runnable queue.
DRC26KFBZIWIFT5CMAVMCHLDYWHUMKC33R3TW2KEZ5NTSVT2AFOAC
2DNPZFPNI2OM5FKYTC2KE5NKKKAP45AQ2VDDYLZZHCJ35X3EBJRQC
LVQXQIYA7QMLVYOANYEFHDBTFAOSE3D2IYAVOG2DXURTASRCUNYQC
KPKXKDNGVWQSM5D5ODNWZBBQDE3YT32CEAWIEYND62P26XHPKGTAC
MHVIT4JYWUYD4UCGB2AHLXWLX6B5SYE22BREERNGANT7RGGDUFOAC
EYR3EW6JVHNVLXMI57FUVPHQAHPETBML4H44OGJFHUT54KTTHIGQC
73YR46NJNYZQKHA3QDJCAZYAKC2CGEF5LIS44NOIPDZU6FX6BDPQC
NKQOEVVPWAT7UQ4JKOK7VWS6H3PKN52DDWT7SEWPK3JS743OGZNQC
Finally removeActiveStep([&]() {
activeSteps_.lock()->erase(activeStep);
});
{
auto activeStep = std::make_shared<ActiveStep>();
activeStep->step = reservation->step;
activeSteps_.lock()->insert(activeStep);
try {
auto destStore = getDestStore();
res = doBuildStep(destStore, reservation, activeStep);
} catch (std::exception & e) {
printMsg(lvlError, format("uncaught exception building ‘%1%’ on ‘%2%’: %3%")
% step->drvPath % reservation->machine->sshName % e.what());
try {
auto destStore = getDestStore();
res = doBuildStep(destStore, reservation, activeStep);
} catch (std::exception & e) {
printMsg(lvlError, format("uncaught exception building ‘%1%’ on ‘%2%’: %3%")
% reservation->step->drvPath % reservation->machine->sshName % e.what());
}