For dd I added "&> /dev/null" at the end to clean up the on-screen output not realizing that it would redirect everything. I should've tested that but dd does output to stdout if you don't specify the "of" option. This has the effect that I'm looking for:
However, I did use ggrep in the example which is GNU's grep from brew so I could get support for the -P option. Mac OS has some really weird versions of these tools.
When I pipe a large amount of data to this script I do get multiple chunks but seemingly randomly I get a chunk that contains two sections instead of one. If I modify the test scripts so they use "-kN1" instead of "-k" they seem to do what I want. I've attached a new test script below.
"-N1" forces it to do one job at a time, is that correct? If so, why would that change this behavior. If not, what does -N1 do that fixes this?
#!/usr/bin/env bash
# Remove any old test results
rm -f *.test-result
# Create the test file
echo -n "Creating test file... "
echo -ne '\x00\x00\x00\x01\x67\xaa' > test.raw
dd if=/dev/zero bs=1000 count=1 >> test.raw 2> /dev/null
echo -ne '\x00\x00\x00\x01\x67\xbb' >> test.raw
dd if=/dev/zero bs=1000 count=1 >> test.raw 2> /dev/null
echo -ne '\x00\x00\x00\x01\x67\xcc' >> test.raw
dd if=/dev/zero bs=1000 count=1 >> test.raw 2> /dev/null
echo "done."
# Validate that the test file is the correct size
expected_size=3018
actual_size=$(wc -c <"test.raw")
if [ $actual_size -ne $expected_size ]; then
echo Size is incorrect. Expected: $expected_size, actual: $actual_size
exit 1
fi
# Count the number of times grep finds this pattern (using ggrep since we're on Mac OS)
echo -n "Instances of pattern found with grep: "
ggrep -obUaP '\x00\x00\x00\x01\x67' test.raw | wc -l
# Have GNU Parallel split up the file based on the given pattern as a regexp
cat test.raw | parallel -k --pipe --regexp --recstart '\x00\x00\x00\x01\x67' --recend '' cat\>{#}.test-result &> /dev/null
# Count the number of output files GNU Parallel created
echo -n "Output files from GNU Parallel with -k option: "
ls -la *.test-result | wc -l
# Remove the test results for the second run
rm *.test-result
# Have GNU Parallel split up the file based on the given pattern as a regexp
cat test.raw | parallel -kN1 --pipe --regexp --recstart '\x00\x00\x00\x01\x67' --recend '' cat\>{#}.test-result &> /dev/null
# Count the number of output files GNU Parallel created
echo -n "Output files from GNU Parallel with -k option: "
ls -la *.test-result | wc -l
# Remove the test results
rm *.test-result