bug-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: exit status issue


From: Dallas Clement
Subject: Re: exit status issue
Date: Tue, 22 Nov 2011 14:55:28 -0600

> That makes no sense.  Fix your quoting (it's atrocious) and then run
> the function with "set -x".  Don't throw strace at it, at this point.
> It's really not the right tool at this level of debugging.

Okay, I simplified the script slightly, fixed the quoting and re-ran
with set -x.  Here is the script and the output for the failure
scenario.

# $1 mntpoint
fsaccesstest()
{
        set -x

        local RETRY_MAX=3
        local RETRY_INTERVAL=3
        local CHK_RESULT=1
        local RETVAL=0

        TMP=`grep "$1" /proc/mounts|awk '{print $1}'`
        if [ "$TMP" != "" ]; then
                for i in `seq 1 ${RETRY_MAX}`
                do
                        touch "$1"/.accesstest
                        CHK_RESULT=$?
                        if [ ${CHK_RESULT} -eq 0 ] ; then
                                break
                        fi
                        logger -s -t ${LOGTAG} -p ${LOGFACILITY} "*** 
fsaccesstest test $1
failed CHK_RESULT=$CHK_RESULT. retrying... (${i}) ***"
                        sleep ${RETRY_INTERVAL}
                done
                if [ $CHK_RESULT -ne 0 ]; then
                        RETVAL=$CHK_RESULT
                fi
        fi

        set +x

        return $RETVAL
}

+ local RETRY_MAX=3
+ local RETRY_INTERVAL=3
+ local CHK_RESULT=1
+ local RETVAL=0
++ grep /mnt/array1 /proc/mounts
++ awk '{print $1}'
+ TMP=/dev/md10
+ '[' /dev/md10 '!=' '' ']'
++ seq 1 3
+ for i in '`seq 1 ${RETRY_MAX}`'
+ touch /mnt/array1/.accesstest
+ CHK_RESULT=1
+ '[' 1 -eq 0 ']'
+ logger -s -t diskmon -p local0.info '*** fsaccesstest test
/mnt/array1 failed CHK_RESULT=1. retrying... (1) ***'
diskmon: *** fsaccesstest test /mnt/array1 failed CHK_RESULT=1.
retrying... (1) ***
+ sleep 3
+ for i in '`seq 1 ${RETRY_MAX}`'
+ touch /mnt/array1/.accesstest
+ CHK_RESULT=0
+ '[' 0 -eq 0 ']'
+ break
+ '[' 0 -ne 0 ']'
+ set +x

> I don't even know what your actual *symptom* is.  I can't deduce it from
> a spew of strace output.  You haven't described the reason for the touch
> command, so I can only presume this is some sort of autofs environment,
> hence my attempt to solve the issue, above.

The purpose of this function is simply to try and create or modify a
test file on a RAID share.  It's just a periodic integrity check.

The set -x output shows that the 'touch' command failed, but it
doesn't show why.  I certainly acknowledge the strace output is a lot
of stuff to untangle, but it does show what is happening at a process
level and the return status for every system call.  What I'm seeing is
that there are lots of concurrent processes running at the time this
particular shell script function is executed.  The other processes
seem to have an adverse effect on the process that is executing
'touch'.  The three system calls that touch makes are all successful.
So my question is why is $? getting set to 1?  It looks to me like the
bash interpreter is returning early from this call to 'touch'.  I
can't prove this yet though.

Here's another example of an unsuccessful touch execution.  Notice how
the execution keeps getting interrupted and then resumed.

[pid 31400] execve("/tmp/test-touch", ["/tmp/test-touch",
"/mnt/array1/.accesstest"], ["HOSTNAME=TS-2RVED8", "TERM=linux",
"SHELL=/bin/sh", "LIBRARY_PATH=/usr/lib:/usr/local"...,
"HUSHLOGIN=FALSE", "USER=root", "PATH=/bin:/sbin:/usr/bin:/usr/sb"...,
"MAIL=/var/spool/mail/root", "C_INCLUDE_PATH=/usr/include:/usr"...,
"PWD=/usr/local/sbin", "HOME=/root", "SHLVL=2", "LOGNAME=root",
"_=/tmp/test-touch"]Process 31401 attached (waiting for parent)

[pid 31400] <... execve resumed> )      = 0
[pid 31400] brk(0 <unfinished ...>
[pid 31400] <... brk resumed> )         = 0x2369000

[pid 31400] open("/mnt/array1/.accesstest", O_RDWR|O_CREAT, 0666
<unfinished ...>
[pid 31400] <... open resumed> )        = 3
[pid 31400] close(3 <unfinished ...>
[pid 31400] <... close resumed> )       = 0
[pid 31400] utimes("/mnt/array1/.accesstest", {4196394, 0} <unfinished
...>
[pid 31400] <... utimes resumed> )      = 0
[pid 31400] exit_group(0)               = ?
Process 23756 resumed
Process 31400 detached
[pid 23756] <... wait4 resumed> [{WIFEXITED(s) && WEXITSTATUS(s) ==
0}], 0, NULL) = 31400
[pid 23756] --- SIGCHLD (Child exited) @ 0 (0) ---
[pid 23756] wait4(-1, Process 23756 suspended



reply via email to

[Prev in Thread] Current Thread [Next in Thread]