Missing "wait" system call

I'm using 0.65.0 and external commands leave zombies as shown below, I guess you forgot to call wait when a child die.

 6958 ? Ssl 61:32 /usr/local/Repetier-Server/bin/RepetierServer -c /usr/local/Repetier-Server/etc/RepetierServer.xml --da
 6977 ? Z 0:00 \_ [webcam]
 7745 ? Z 0:00 \_ [omxplayer]
 7763 ? Z 0:00 \_ [webcam]
10374 ? Z 0:00 \_ [webcam]
10533 ? Z 0:00 \_ [omxplayer]
10551 ? Z 0:00 \_ [webcam]
11095 ? Z 0:00 \_ [webcam]

Comments

  • Why do you think these zombies belong to repetier-server? We have no threads with these names. I think these come from kweb or your browser. I recognice omxplayer which is used to show videos by some browsers. Not sure with webcam. Might be because you started the process from a server command you wrote?

    BTW: What command did you use to show this graph. Tried pstree on a pi2 that runs for a week and got no child processes;

    init─┬─RepetierServer───18*[{RepetierServer}]
         ├─afpd───{afpd}
         ├─avahi-daemon───avahi-daemon
         ├─cnid_metad
         ├─cron
         ├─2*[dbus-daemon]
         ├─dbus-launch
         ├─dhclient
         ├─3*[ifplugd]
         ├─ntpd
         ├─rc───startpar───rc.local───rc.local───su───xinit─┬─Xorg
         │                                                  └─sh─┬─kweb───6*[{kweb}]
         │                                                       └─openbox
         ├─rsyslogd───3*[{rsyslogd}]
         ├─sshd───sshd───sshd───bash───pstree
         ├─thd
         ├─udevd───2*[udevd]
         └─wpa_supplicant

  • edited November 2015
    I don't think these zombies belongs to RepetierServer, I know they are !
    You can use "ps fax" command to show process tree on every linux distribution.

    You'll have to add such line to /var/lib/Repetier-Server/database/extcommands.xml
    <execute name="play" allowParams="false">/usr/bin/omxplayer /home/pi/sound/sound.mp3</execute>

    Then run it on the startup event gcode and you'll show the omxplayer zombie after the next print.

    The "webcam" command is a script I wrote to launch mjpg-streamer at print startup and kill it when done.


  • Ok, extcommands starting them explains why I didn't register them as server processes.


    I have found how I call external commands. You are right that normally I use a wait to wait for the process to finish.

    In this special case I do not wait to not block the thread. Otherwise the server blocks till webcam is finished, so in general it is good. I will see if it is possible to hold a list of processes and test if they are finished in a non blocking way using the framework I use.
  • You can register a handler for SIGCHLD signal to make the wait on every dead child in a non-blocking manner. Doing so you no more have to add wait call after child creation.

    For example : http://www.microhowto.info/howto/reap_zombie_processes_using_a_sigchld_handler.html
  • Hi,
    This still seem to be an issue with v1.4.9. I guess it could generate problems in case of high uptimes where those zombies of 'extcommands' are accumulated.

  • @alpy How did you call these scripts. We call many scripts without them becoming zombies, so in general it is ok but one integration way seems to fail to query final exit code which is when you get the zombies. Is this with async @execute or some other way?
  • Yes, I used @execute. Is there any other way to call external scripts?
  • Can you post the extcommands.xml so I see how you call it exactly?

    In extcommands you can also define menu entries that execute commands directly from menu. Also from the global settings->terminal you could execute commands. Other cases are more internal.
  • Sure, here it is:
    <config>
    <command slug="Ender3">
    <name>Power ON printer</name>
    <execute>/home/alpy/printer_on.sh</execute>
    <local>true</local>
    </command>
    <command slug="Ender3">
    <name>Power OFF printer</name>
    <execute>/home/alpy/printer_off.sh</execute>
    <confirm>Really power off printer?</confirm>
    <local>true</local>
    </command>
    <execute name="printeron" allowParams="false">/home/alpy/printer_on.sh</execute>
    <execute name="printeroff" allowParams="false">/home/alpy/printer_off.sh</execute>
    </config>
    As you can see I use two different methods to call the same script and I can confirm that both methods result in zombie processes if they are called by either @execute or from menu button.

    Here's the printer_on.sh script:
    #!/bin/bash
    GPIO=17

    raspi-gpio set $GPIO op
    raspi-gpio set $GPIO dh

    sleep 5

    sudo systemctl start klipper

    ... and the printer_off.sh:
    #!/bin/bash
    sudo systemctl stop klipper

    sleep 2

    GPIO=17

    raspi-gpio set $GPIO dl

  • Ok I could reproduce and fix the issue. Happens when sync="false" which is also default if you omit it. It then fails to read exit code keeping it as zombie. In next release it will remove it's zombies correctly.
  • Thanks for your efforts.
    In fact the sync="true" seem to solve the problem and for me it's OK to wait until that script finishes. However there seems to be no such option for the menu button item. I tried the following but it did not help:
        <command slug="Ender3">
    <name>Power ON printer</name>
    <execute sync="true">/home/alpy/printer_on.sh</execute>
    <local>true</local>
    </command>
    Am I doing something wrong or it's not implemented?




  • No nothing wrong. Buttons just have no sync option since you will never see output. So it uses the async execute whith the issue. But I'm close to releasing the update. Just 2 bugs I'm hunting at the moment away plus a bit testing.
  • FYI,
    I can confirm that 1.4.10 solves the issue completely and no zombies are seen anymore.

    Thanks for the great support.
Sign In or Register to comment.