grahamc/hydra2 - Change SL3WSRACCX2IMJHHLTRAUQT7QDLCOKYLVO2FEHWIHXM5GPKSRJTQC

hydra-queue-runner: Limit memory usage

When using a binary cache store, the queue runner receives NARs from the build machines, compresses them, and uploads them to the cache. However, keeping multiple large NARs in memory can cause the queue runner to run out of memory. This can happen for instance when it's processing multiple ISO images concurrently.

The fix is to use a TokenServer to prevent the builder threads to store more than a certain total size of NARs concurrently (at the moment, this is hard-coded at 4 GiB). Builder threads that cause the limit to be exceeded will block until other threads have finished.

The 4 GiB limit does not include certain other allocations, such as for xz compression or for FSAccessor::readFile(). But since these are unlikely to be more than the size of the NARs and hydra.nixos.org has 32 GiB RAM, it should be fine.

Created by Eelco Dolstra on March 9, 2016

SL3WSRACCX2IMJHHLTRAUQT7QDLCOKYLVO2FEHWIHXM5GPKSRJTQC

Dependencies

In channels

main

Change contents

Replacement in src/hydra-queue-runner/build-remote.cc at line 286 [7.85]

∅:D[5.1035] → [6.1286:1404]

B:BD[6.1286] → [6.1286:1404]

        printMsg(lvlDebug, format("copying outputs of ‘%1%’ from ‘%2%’") % step->drvPath % machine->sshName);

[5.1035]

[8.816]

        MaintainCount mc(nrStepsCopyingFrom);
        auto now1 = std::chrono::steady_clock::now();

Replacement in src/hydra-queue-runner/build-remote.cc at line 293 [7.85]

∅:D[8.931] → [9.290:336]

B:BD[9.290] → [9.290:336]

        MaintainCount mc(nrStepsCopyingFrom);

[8.931]

[10.217]


        /* Query the size of the output paths. */
        size_t totalNarSize = 0;
        to << cmdQueryPathInfos << outputs;
        to.flush();
        while (true) {
            if (readString(from) == "") break;
            readString(from); // deriver
            readStrings<PathSet>(from); // references
            readLongLong(from); // download size
            totalNarSize += readLongLong(from);
        }
        printMsg(lvlDebug, format("copying outputs of ‘%s’ from ‘%s’ (%d bytes)")
            % step->drvPath % machine->sshName % totalNarSize);
        /* Block until we have the required amount of memory
           available. FIXME: only need this for binary cache
           destination stores. */
        auto resStart = std::chrono::steady_clock::now();
        auto memoryReservation(memoryTokens.get(totalNarSize));
        auto resStop = std::chrono::steady_clock::now();

Replacement in src/hydra-queue-runner/build-remote.cc at line 316 [7.85]

B:BD[10.218] → [3.177:231]

        result.accessor = destStore->getFSAccessor();

[10.218]

[3.231]

        auto resMs = std::chrono::duration_cast<std::chrono::milliseconds>(resStop - resStart).count();
        if (resMs >= 1000)
            printMsg(lvlError, format("warning: had to wait %d ms for %d memory tokens for %s")
                % resMs % totalNarSize % step->drvPath);

Replacement in src/hydra-queue-runner/build-remote.cc at line 321 [7.85]
∅:D[3.232] → [10.218:272]
B:BD[10.218] → [10.218:272]
```
        auto now1 = std::chrono::steady_clock::now();
```
[3.232]
[10.272]
```
        result.accessor = destStore->getFSAccessor();
```
Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 20 [12.4840]
[12.7501]
[12.7501]
```
    : memoryTokens(4ULL << 30) // FIXME: make this configurable
```
Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 571 [12.4840]
[4.626]
[13.1628]
```
        root.attr("memoryTokensInUse", memoryTokens.currentUse());
```
Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 595 [12.4840]
[13.2016]
[14.1730]
Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 607 [12.4840]
[14.2189]
[15.1936]
Insertion in src/hydra-queue-runner/state.hh at line 12 [16.1128]
[16.1299]
[16.1299]
```
#include "token-server.hh"
#include "derivations.hh"
```
Deletion in src/hydra-queue-runner/state.hh at line 17 [16.1128]
B:BD[16.1342] → [16.1342:1361]
B:BD[16.1388] → [16.1388:1389]
```
#include "sync.hh"
```
Replacement in src/hydra-queue-runner/state.hh at line 18 [16.1128]
B:BD[16.1413] → [16.1413:1439]
B:BD[16.1439] → [17.1218:1261]
```
#include "derivations.hh"
#include "binary-cache-store.hh" // FIXME
```
[16.1413]
[16.1461]
```
#include "sync.hh"
```

Insertion in src/hydra-queue-runner/state.hh at line 355 [16.1128]

[18.1502]


    /* Token server to prevent threads from allocating too many big
       strings concurrently while importing NARs from the build
       machines. When a thread imports a NAR of size N, it will first
       acquire N memory tokens, causing it to block until that many
       tokens are available. */
    nix::TokenServer memoryTokens;

Insertion in src/hydra-queue-runner/token-server.hh at line 6 [19.1640]
[19.1693]
[19.1693]
```
#include "types.hh"
namespace nix {
```
Insertion in src/hydra-queue-runner/token-server.hh at line 10 [19.1640]
[19.1694]
[19.1694]
```
MakeError(NoTokens, Error)
```

Replacement in src/hydra-queue-runner/token-server.hh at line 13 [19.1640]

B:BD[19.1764] → [19.1764:1945]

   available. Calling get() will return a Token object, representing
   ownership of a token. If no token is available, get() will sleep
   until another thread returns a token. */

[19.1764]

[19.1945]

   available. Calling get(N) will return a Token object, representing
   ownership of N tokens. If the requested number of tokens is
   unavailable, get() will sleep until another thread returns a
   token. */

Replacement in src/hydra-queue-runner/token-server.hh at line 20 [19.1640]
B:BD[19.1966] → [19.1966:1994]
```
    unsigned int maxTokens;
```
[19.1966]
[19.1994]
```
    const size_t maxTokens;
```
Replacement in src/hydra-queue-runner/token-server.hh at line 22 [19.1640]
B:BD[19.1995] → [19.1995:2032]
```
    Sync<unsigned int> curTokens{0};
```
[19.1995]
[2.2375]
```
    Sync<size_t> inUse{0};
```

Replacement in src/hydra-queue-runner/token-server.hh at line 26 [19.1640]

B:BD[19.2081] → [19.2081:2148]

    TokenServer(unsigned int maxTokens) : maxTokens(maxTokens) { }

[19.2081]

[19.2148]

    TokenServer(size_t maxTokens) : maxTokens(maxTokens) { }

Insertion in src/hydra-queue-runner/token-server.hh at line 34 [19.1640]
[19.2227]
[19.2227]
```
        size_t tokens;
```

Replacement in src/hydra-queue-runner/token-server.hh at line 38 [19.1640]

B:BD[19.2259] → [19.2259:2322]

        Token(TokenServer * ts, unsigned int timeout) : ts(ts)

[19.2259]

[19.2322]

        Token(TokenServer * ts, size_t tokens, unsigned int timeout)
            : ts(ts), tokens(tokens)

Replacement in src/hydra-queue-runner/token-server.hh at line 41 [19.1640]

B:BD[19.2332] → [19.2332:2430]

            auto curTokens(ts->curTokens.lock());
            while (*curTokens >= ts->maxTokens)

[19.2332]

[19.2430]

            if (tokens >= ts->maxTokens)
                throw NoTokens(format("requesting more tokens (%d) than exist (%d)") % tokens);
            auto inUse(ts->inUse.lock());
            while (*inUse + tokens > ts->maxTokens)

Replacement in src/hydra-queue-runner/token-server.hh at line 46 [19.1640]

B:BD[19.2461] → [19.2461:2623]

                    if (!curTokens.wait_for(ts->wakeup, std::chrono::seconds(timeout),
                            [&]() { return *curTokens < ts->maxTokens; }))

[19.2461]

[19.2623]

                    if (!inUse.wait_for(ts->wakeup, std::chrono::seconds(timeout),
                            [&]() { return *inUse + tokens <= ts->maxTokens; }))

Replacement in src/hydra-queue-runner/token-server.hh at line 50 [19.1640]

B:BD[19.2678] → [19.2678:2754]

                    curTokens.wait(ts->wakeup);
            (*curTokens)++;

[19.2678]

[19.2754]

                    inUse.wait(ts->wakeup);
            *inUse += tokens;

Replacement in src/hydra-queue-runner/token-server.hh at line 64 [19.1640]

B:BD[19.2983] → [19.2983:3105]

                auto curTokens(ts->curTokens.lock());
                assert(*curTokens);
                (*curTokens)--;

[19.2983]

[19.3105]

                auto inUse(ts->inUse.lock());
                assert(*inUse >= tokens);
                *inUse -= tokens;

Replacement in src/hydra-queue-runner/token-server.hh at line 74 [19.1640]

B:BD[19.3223] → [19.3223:3263]

    Token get(unsigned int timeout = 0)

[19.3223]

[19.3263]

    Token get(size_t tokens = 1, unsigned int timeout = 0)
    {
        return Token(this, tokens, timeout);
    }
    size_t currentUse()

Replacement in src/hydra-queue-runner/token-server.hh at line 81 [19.1640]
B:BD[19.3269] → [19.3269:3306]
```
        return Token(this, timeout);
```
[19.3269]
[19.3306]
```
        auto inUse_(inUse.lock());
        return *inUse_;
```
Insertion in src/hydra-queue-runner/token-server.hh at line 85 [19.1640]
[19.3315]
```
}
```

hydra-queue-runner: Limit memory usage

Dependencies

In channels

Change contents

Replacement in src/hydra-queue-runner/build-remote.cc at line 286 [7.85]

Replacement in src/hydra-queue-runner/build-remote.cc at line 293 [7.85]

Replacement in src/hydra-queue-runner/build-remote.cc at line 316 [7.85]

Replacement in src/hydra-queue-runner/build-remote.cc at line 321 [7.85]

Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 20 [12.4840]

Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 571 [12.4840]

Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 595 [12.4840]

Insertion in src/hydra-queue-runner/hydra-queue-runner.cc at line 607 [12.4840]

Insertion in src/hydra-queue-runner/state.hh at line 12 [16.1128]

Deletion in src/hydra-queue-runner/state.hh at line 17 [16.1128]

Replacement in src/hydra-queue-runner/state.hh at line 18 [16.1128]

Insertion in src/hydra-queue-runner/state.hh at line 355 [16.1128]

Insertion in src/hydra-queue-runner/token-server.hh at line 6 [19.1640]

Insertion in src/hydra-queue-runner/token-server.hh at line 10 [19.1640]

Replacement in src/hydra-queue-runner/token-server.hh at line 13 [19.1640]

Replacement in src/hydra-queue-runner/token-server.hh at line 20 [19.1640]

Replacement in src/hydra-queue-runner/token-server.hh at line 22 [19.1640]