When using a binary cache store, the queue runner receives NARs from the build machines, compresses them, and uploads them to the cache. However, keeping multiple large NARs in memory can cause the queue runner to run out of memory. This can happen for instance when it's processing multiple ISO images concurrently.
The fix is to use a TokenServer to prevent the builder threads to store more than a certain total size of NARs concurrently (at the moment, this is hard-coded at 4 GiB). Builder threads that cause the limit to be exceeded will block until other threads have finished.
The 4 GiB limit does not include certain other allocations, such as for xz compression or for FSAccessor::readFile(). But since these are unlikely to be more than the size of the NARs and hydra.nixos.org has 32 GiB RAM, it should be fine.
SL3WSRACCX2IMJHHLTRAUQT7QDLCOKYLVO2FEHWIHXM5GPKSRJTQC
B2L4T3X63XVYJQXEDU4WT5Y4R6PMDXGC6WN2KGOMHRQILSABNQOAC
UVNTWTWGQOFKDAJ2ROJYT4U2N4EUXKNWZWPHOM42WPLUL4ALXRJQC
BRAESISHTN4IIWUBVDMPDMY7QLMJDKX7GQ7K6NSJN66L5VPWSX3QC
73YR46NJNYZQKHA3QDJCAZYAKC2CGEF5LIS44NOIPDZU6FX6BDPQC
N4IROACVZ4MU73J5SM6WXJMKQSFR3VN5SOKENNNZNEGMTGB2Q3HAC
5AIYUMTBY6TFQTBRP3MJ2PYWUMRF57I77NIVWYE74UMEVQMBWZVQC
A2GL5FOZ3UJ2NM5RPRWTNPFTKLBA54B2UC6UIYO4M3N3RFNC4BTAC
LE4VZIY5VZ52FOP5QQRIJINWIMWTAPRTZTGO77JXUEPGRPRSQYMAC
FITVNQ2SVM6KSOF5P3HHWJYQ3WMQYDJGAONCBIZ7OF7CPXGMA36QC
6EO3HVNAAVJLBYTBSHXQADDTYF7JJU3ANHS5SKHSSROOW4AXIZ4AC
24BMQDZAWDQ7VNIA7TIROXSOYLOJBNZ2E4264WHWNJAEN6ZB3UOAC
PLOZBRTR6USSGJX7GR2RZKNPVYG2Q6QM7LW6IA35MKL63ZTQVD7QC
IK2UBDAU6QKUXHJG3SXJKYGIIXRDKI6UVRTFC6ZVDXDCGNCMEWVAC
EYR3EW6JVHNVLXMI57FUVPHQAHPETBML4H44OGJFHUT54KTTHIGQC
HJOEIMLRDVQ2KZI5HGL2HKGBM3AHP7YIKGKDAGFUNKRUXVRB24NAC
GJV2J5HXFKVF7BXNFMTI6ZZMYADXIKTFIQJ3FONJ7DPVCUJZ2L4AC
EOO4EFWD2BJCGF3ZKS2QR3XDW4WHUGH2EHSOFVK6GMI5BUBZW6QQC
MB3TISH2KYBIGY6XJKMN4HO2S6TCN2GORJENMECCKLXGGIRS2O2AC
MaintainCount mc(nrStepsCopyingFrom);
/* Query the size of the output paths. */
size_t totalNarSize = 0;
to << cmdQueryPathInfos << outputs;
to.flush();
while (true) {
if (readString(from) == "") break;
readString(from); // deriver
readStrings<PathSet>(from); // references
readLongLong(from); // download size
totalNarSize += readLongLong(from);
}
printMsg(lvlDebug, format("copying outputs of ‘%s’ from ‘%s’ (%d bytes)")
% step->drvPath % machine->sshName % totalNarSize);
/* Block until we have the required amount of memory
available. FIXME: only need this for binary cache
destination stores. */
auto resStart = std::chrono::steady_clock::now();
auto memoryReservation(memoryTokens.get(totalNarSize));
auto resStop = std::chrono::steady_clock::now();
result.accessor = destStore->getFSAccessor();
auto resMs = std::chrono::duration_cast<std::chrono::milliseconds>(resStop - resStart).count();
if (resMs >= 1000)
printMsg(lvlError, format("warning: had to wait %d ms for %d memory tokens for %s")
% resMs % totalNarSize % step->drvPath);
/* Token server to prevent threads from allocating too many big
strings concurrently while importing NARs from the build
machines. When a thread imports a NAR of size N, it will first
acquire N memory tokens, causing it to block until that many
tokens are available. */
nix::TokenServer memoryTokens;
available. Calling get() will return a Token object, representing
ownership of a token. If no token is available, get() will sleep
until another thread returns a token. */
available. Calling get(N) will return a Token object, representing
ownership of N tokens. If the requested number of tokens is
unavailable, get() will sleep until another thread returns a
token. */
auto curTokens(ts->curTokens.lock());
while (*curTokens >= ts->maxTokens)
if (tokens >= ts->maxTokens)
throw NoTokens(format("requesting more tokens (%d) than exist (%d)") % tokens);
auto inUse(ts->inUse.lock());
while (*inUse + tokens > ts->maxTokens)
}