[fix]fix the situation that pijul recognized UTF-8 text file with UTF-8 BOM as binary

DzmingLi
Dec 1, 2025, 9:49 AM
66KOHBXH6YINBEZWCE2YO6G3P6BDYAORE2FMWCE25H5DH4GTYAOQC

Dependencies

  • [2] HWYGVLP5 Replacing the temporary copy of chardetng with the published version
  • [3] SXEYMYF7 Fixing the bad changes in history (unfortunately, by rebooting).

Change contents

  • replacement in libpijul/src/lib.rs at line 722
    [2.438][2.438:487]()
    if encoding.encode(&s).0 == buffer {
    [2.438]
    [2.487]
    let reencoded = encoding.encode(&s).0;
    // Special handling for UTF-8 BOM: encoding_rs doesn't preserve BOM during encode,
    // but the file is still valid UTF-8 text. Check if the only difference is a UTF-8 BOM.
    if reencoded == buffer {
    return Some(e);
    } else if encoding == encoding_rs::UTF_8
    && buffer.starts_with(b"\xef\xbb\xbf")
    && &buffer[3..] == &*reencoded {
    // The file has a UTF-8 BOM prefix, which is removed during decode/encode.
    // This is still valid UTF-8 text.