hydra-queue-runner: Fix a race keeping cancelled steps alive

[?]
Nov 8, 2016, 10:42 AM
DRC26KFBZIWIFT5CMAVMCHLDYWHUMKC33R3TW2KEZ5NTSVT2AFOAC

Dependencies

  • [2] 2DNPZFPN Step cancellation: Don't use pthread_cancel()
  • [3] MHVIT4JY Split hydra-queue-runner.cc more
  • [4] KQ3EGUQY Add some instrumentation to keep track of dispatcher cost
  • [5] LVQXQIYA Kill active build steps when builds are cancelled
  • [6] KPKXKDNG hydra-queue-runner: Fix assertion failure
  • [7] 73YR46NJ hydra-queue-runner: Write directly to a binary cache
  • [8] EYR3EW6J Keep stats for the Hydra auto scaler
  • [9] NKQOEVVP Get rid of "will retry" messages after "maybe cancelling..."

Change contents

  • replacement in src/hydra-queue-runner/builder.cc at line 16
    [3.1][3.0:96](),[3.139][3.139:184]()
    auto activeStep = std::make_shared<ActiveStep>();
    activeStep->step = reservation->step;
    activeSteps_.lock()->insert(activeStep);
    [3.1]
    [3.350]
    Step::wptr wstep = reservation->step;
  • replacement in src/hydra-queue-runner/builder.cc at line 18
    [3.93][3.93:130](),[3.220][3.220:268](),[3.268][3.215:223](),[3.215][3.215:223]()
    Finally removeActiveStep([&]() {
    activeSteps_.lock()->erase(activeStep);
    });
    [3.351]
    [3.58]
    {
    auto activeStep = std::make_shared<ActiveStep>();
    activeStep->step = reservation->step;
    activeSteps_.lock()->insert(activeStep);
  • replacement in src/hydra-queue-runner/builder.cc at line 23
    [3.59][3.59:94]()
    auto step = reservation->step;
    [3.59]
    [3.388]
    Finally removeActiveStep([&]() {
    activeSteps_.lock()->erase(activeStep);
    });
  • replacement in src/hydra-queue-runner/builder.cc at line 27
    [3.389][3.389:399](),[3.399][3.1152:1193](),[3.1193][2.767:830](),[3.96][3.499:627](),[3.159][3.499:627](),[2.830][3.499:627](),[3.1261][3.499:627](),[3.499][3.499:627](),[3.627][3.160:233]()
    try {
    auto destStore = getDestStore();
    res = doBuildStep(destStore, reservation, activeStep);
    } catch (std::exception & e) {
    printMsg(lvlError, format("uncaught exception building ‘%1%’ on ‘%2%’: %3%")
    % step->drvPath % reservation->machine->sshName % e.what());
    [3.389]
    [3.687]
    try {
    auto destStore = getDestStore();
    res = doBuildStep(destStore, reservation, activeStep);
    } catch (std::exception & e) {
    printMsg(lvlError, format("uncaught exception building ‘%1%’ on ‘%2%’: %3%")
    % reservation->step->drvPath % reservation->machine->sshName % e.what());
    }
  • replacement in src/hydra-queue-runner/builder.cc at line 43
    [3.940][3.97:121]()
    if (res != sDone) {
    [3.940]
    [3.121]
    Step::ptr step = wstep.lock();
    if (res != sDone && step) {