Server crashes / shuts-down randomly (and offers wrong rescue-point)

Hello,
I am using Repetier Server Pro 1.3.0 on Raspberry Pi 3B with a printer running Marlin 2.0.9.3 on SKR 1.4T
Until my last Repetier-Server update I had no server problems. But now i keeps randomly crashing.

The error expresses as a blank-white Touchscreen (installed on GPIO-Pins). That is normally only white, after shutting the server down manually. The server is then also not reachable via network interface. The red-LED on the Pi is steady on the green one steady off. - it looks like an regular shut down - but in the middle of a print by itself.

The SKR1.4 Board behaves like it doesn't recognise the absence of the server: its standing still after last command and keeps heating bed and extruder.

At least once it seemed as the server crashed imideately after opening the network interface on my phone, since i saw one last move on the "steuerung"-screen and then the connection-lost error.

The print-logs offer no explanation - seme like stopped in the middle of the print.
https://www.dropbox.com/sh/algs0cy4d13kyo0/AACui_i2zhszGVSs48AH9cH3a?dl=0

It there Something like an Repetier-Server-Log wich i can check for errors? Cant find it.

Also to mention:
with the last Repetier Server Update, i also updated Marlin and activated BTT-smart-Filament-Sensor (connected to SKR) and so had to activate HOST_PAUSE_M76 and HOST_PROMT_SUPPORT. Both seemed to work flawlessly - but of course could have a connection to the server-Shutdowns.
---------------------------------

Dont know if this error has the same cause:
After Server-crash/shutdown and after reboot it offers me the option to rescue the print.
That sadly works only sometimes (about every second to third try). Sometimes it heates after starting the rescue but doesnt do anything else after reaching the temperatures.
And sometimes it offers obviously a wrong restarting height. Specialy if i allready rescued the print (e.g at 7mm) and it crashed again (at 13mm) it offers me a rescue restart at the old height (again at 7mm) but no option to chose or change the heigt.



Comments

  • The print logs are of no help here. Where you could download the print logs you can also download server.log which is the log of server starting/stopping. That should show a bit more.

    Also of interest is always syslog downloadable at same position. Check the time of crash to see if anything is happening.

    Did you start using MQTT? It is currently the only new feature I know can crash server if mqtt server is too slow to receive data.

    If it is not mqtt I'd be grateful if you can send a full traceback at the moment of the crash as described here:

    https://www.repetier-server.com/knowledgebase/debugging-crashes-hangs-on-linux/

    On de domain you also find german entry:-)

    That would allow me seeing exactly at which part of the code the crash happens (assuming it is one).

  • Hello,

    I just added the (complete) server.log in the Dropbox.
    I am not sure when the printer shut down, but I would guess the log file of the print-logs have been written until short before it.

    So the last two crashes would have been at
    24.03.2022 17:28:41 and
    24.03.2022 23:24:00

    Here the interesting parts:

    2022-03-24 17:26:03: Execute error response:Error: Wired connection 1 - no such connection profile.
    2022-03-24 17:26:03: While executing:/usr/bin/sudo /usr/local/Repetier-Setup/bin/manageWifiAccess ethStatus
    2022-03-24 17:26:20: Starting print recover Abtropf-Halter
    2022-03-24 17:26:21: start printjob Abtropf-Halter on printer Rostock-Marlin
    2022-03-24 17:26:21: Updating info for /var/lib/Repetier-Server/printer/RostockMarlin/jobs/00000002_Abtropf-Halter.g printer RostockMarlin
    2022-03-24 17:26:30: Time analysing /var/lib/Repetier-Server/printer/RostockMarlin/jobs/00000002_Abtropf-Halter.g:8513791 us
    2022-03-24 17:27:05: killing printjob Abtropf-Halter on printer Rostock-Marlin
    2022-03-24 17:29:04: Job created: /var/lib/Repetier-Server/printer/RostockMarlin/jobs/00000003_Abtropf-Halter.u
    [...]
    2022-03-24 17:47:28: Closing websocket for missing ping
    2022-03-24 17:47:28: Websocket opened
    2022-03-24 18:11:33: Closing websocket for missing ping
    2022-03-24 18:11:33: Websocket opened
    2022-03-24 18:48:25: Websocket: Client closed connection unexpectedly
    2022-03-24 20:26:48: Stopping for signal 15
    2022-03-24 20:26:48: Stopping MQTT subsystem ...
    2022-03-24 20:26:48: Stopping lua runner ...
    2022-03-24 20:26:48: Stopping global cloud ...
    2022-03-24 20:26:48: Stopped wifi watcher.
    2022-03-24 20:26:48: Stopping open threads ...
    2022-03-24 20:26:48: Shutting down web server.
    2022-03-24 20:26:48: Closing server
    [...]
    2022-03-24 23:24:01: Stopping for signal 15
    2022-03-24 23:24:01: Stopping MQTT subsystem ...
    2022-03-24 23:24:01: Stopping lua runner ...
    2022-03-24 23:24:01: Stopping global cloud ...
    2022-03-24 23:24:01: Stopped wifi watcher.
    2022-03-24 23:24:01: Stopping open threads ...
    2022-03-24 23:24:02: Shutting down web server.
    2022-03-24 23:24:02: Websocket opened
    2022-03-24 23:24:02: Closing server

    At least at 23:24:00 it fits perfectly the time-stamp of the las Print-log entry.

    For most of the entrys I cannot say if they should be there.
    entries that i would consider suspicious are:

    2022-03-24 17:26:03: Execute error response:Error: Wired connection 1 - no such connection profile.
    2022-03-24 17:47:28: Closing websocket for missing ping
    2022-03-24 20:26:48: Stopping for signal 15


    Since I am absolutely no friend of the Linux console, would wait for your next answer before doing the steps described in your link above.
  • P.S.: I didn't activate or use MQTT actively. Since it seems to be active ("Stopping MQTT subsystem ..."), it was activated automatically with the last update.
  • MQTT subsystem is always startet (int log). But only if active it does anyting, otherwise it just sees that nothing is to do.

    The good thing is that it is no real crash.
    2022-03-24 20:26:48: Stopping for signal 15
    That means linux did send SIGTERM to server forcing it to shut down. From server side it seems to come from no where - no log entry close to that time stamp and as you see server is shut down cleanly and even logs it.

    First you should check /var/log/syslog at the timestamp to see what linux says. Was it triggered from linux or an app.
    It might also come from server it self if some uncatched error happened. That would require the debug solution I described above to see where it stopped. If it comes from server it is from an uncatched exception and we would see it there.
    Other reasons would be if linux wanted to reboot or if linux runs out of resources it would also kill some softwares to get enough memory - therefor the syslog check.

  • I tried to follow the procedure from your link. But wasnt really successful.

    in short i did the following in ssh:
    sudo apt-get update
    sudo apt-get install gdb
    sudo gdb
    attach 1139
    set logging on
    sudo gdb
    c
    then I started a print - wich stopped by itself Print Log:
    [...]
    Recv:12:13:08.863: ok
    Send:12:13:08.863: N14620 G1 X14.176 Y-25.862 E100.55433 [end]
    Server-Log
    [...]
    2022-03-27 11:52:05: Time analysing /var/lib/Repetier-Server/printer/RostockMarlin/jobs/00000005_Abtropf-Halter.g:6698513 us
    2022-03-27 12:13:08: Stopping for signal 15
    2022-03-27 12:13:08: Stopping MQTT subsystem ...
    2022-03-27 12:13:08: Stopping lua runner ...
    2022-03-27 12:13:08: Stopping global cloud ...
    2022-03-27 12:13:08: Stopped wifi watcher.
    2022-03-27 12:13:08: Stopping open threads ...
    2022-03-27 12:13:08: Shutting down web server.
    2022-03-27 12:13:08: Closing server
    [...]
    Sys-Log:
    [...]
    2022-03-27 11:52:05: Time analysing /var/lib/Repetier-Server/printer/RostockMarlin/jobs/00000005_Abtropf-Halter.g:6698513 us
    2022-03-27 12:13:08: Stopping for signal 15
    2022-03-27 12:13:08: Stopping MQTT subsystem ...
    2022-03-27 12:13:08: Stopping lua runner ...
    2022-03-27 12:13:08: Stopping global cloud ...
    2022-03-27 12:13:08: Stopped wifi watcher.
    2022-03-27 12:13:08: Stopping open threads ...
    2022-03-27 12:13:08: Shutting down web server.
    2022-03-27 12:13:08: Closing server
    [...]
    In the moment the server shut down, putty lost connection. So I had some problems following the advice "It is important to keep the console open." After server restart I did following steps:
    sudo gdb
    bt
    which resulted in:
    (gdb) bt
    No stack.
    then I tried:
    attach 1139
    again. wich led to:
    (gdb) attach 1139
    Attaching to process 1139
    warning: unable to open /proc file '/proc/1139/status'
    warning: unable to open /proc file '/proc/1139/status'
    ptrace: No such process.
    seems like ne Process-number is changing. so i looked again with "ps aux" and found it to bee 1149 now... so i did following again:
    (gdb) attach 1149
    Attaching to process 1149
    Reading symbols from /usr/local/Repetier-Server/bin/RepetierServer...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /usr/local/Repetier-Server/bin/RepetierServer]
    (no debugging symbols found)...done.
    0x00c767a8 in sccp ()
    (gdb) bt
    #0 0x00c767a8 in sccp ()
    #1 0x00c80778 in __sigtimedwait_time64 ()
    #2 0x00c9c258 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    and:
    (gdb) thread apply all bt
    Thread 1 (process 1149):
    #0 0x00c767a8 in sccp ()
    #1 0x00c80778 in __sigtimedwait_time64 ()
    #2 0x00c9c258 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    What did I do wrong? Looks not like "a very long list with infos about all thread."... Where do I find the "gdb.txt" and how can i download / open it to copy its content via ssh?
    
    
    
    
  • Reading symbols from /usr/local/Repetier-Server/bin/RepetierServer...Dwarf Error: wrong version in compilation unit header (is 5, should be 2, 3, or 4) [in module /usr/local/Repetier-Server/bin/RepetierServer]
    How old is the image? We use a newer compiler and the error looks like the debugger does now know format 5 as it is too new. Seems like it exist since 2017 so not sure which gdb is the first to support it. None the less should it have stopped on a hard error. As it looks it did not trigger since the signal was catched or comes from outside. Not sure.

    Did you check syslog as same time already?

    Process id changed since server was restarted.
  • Have just uploaded syslog.log to dropbox.

    The Repetier server image should be "Repetier-Server-Image_0_85_2_v7".

    Since I had a some problems getting my webcam and touchscreen to work properly, once it was set up, i saw no reason to refresh the image.
    Thought all necessary is kept up to date by the update function integrated in the user interface(?).

    What are my next steps?
  • Ok, that is a old one. You probably even do not use buster debian then - explains the debug issue. But no problem otherwise so far. If you want you can backup data and upgrade to newer image, but no need to.

    From syslog I see
    Mar 27 12:13:08 RepetierServer systemd[1]: Started Turns off Raspberry Pi display backlight on shutdown/reboot.
    meaning linux executed reboot command. So that explains why server shuts down - it should and does on reboot so that is good.
    There is a reboot function in main menu. Did you call this? I mean it has a security question so should not be an error to trigger it. Otherwise do you have any g-code script triggering reboot that might trigger here? Log does not contain anything why it reboots. Also in server log I see from you I see no reason. I'm pretty sure it would normally write "execute ...." but not 100% sure if that manages to write in reboot case.

    I also see your linux has ModemManager installed. This makes problem connecting to printer the first 20 seconds. You should better uninstall it.
    sudo apt-get remove ModemManager
    should work here.

    So now we know it is not a bug next thing is to find who and why linux reboots. But I see no reason in logs so hope you have an idea as you configured the system. By default only reboot way is the main menu reboot function, but you would notice if you call it your self.

  • Hello,

    i started now over again with the new SD-image File "Repetier-Server-Image_1_3_0_v28.img"
    I copied no configuration-file from the old SD, but typed everything manually again
    The error is still present and gdb is - at least for me -  as usefull as before.
    I Added all logs i could get. sadly no print.log was written, so I cannot say when a unwanted shutdown happened. I will try again tomorrow.
    https://www.dropbox.com/sh/camkhoqum1dqnco/AADVEc-coLbmHUyNa_V0MD63a?dl=0
    (new Dropbox folder for new images logs)

    Can there be any communication between SKR1.4 and Raspberry  bypassing Repetier-Server, wich is telling the Pi to shut down? As far as i remember the Pi never shut down without a running print job by itself.

    I recognized "Stopping for signal 2" has joined "Stopping for signal 15" messages - its getting worse...:(

    (I have deactivated WIFI with these instructions, since the device is operated stationary by cable:
    https://www.laub-home.de/wiki/Raspberry_Pi_Wifi_deaktivieren
    unfortunately Raspbian does not seem to get along with it and spams the whole sys.log with error messages.)
  • I dont know what happened now.
    I reactivated wifi to clean up sys-log a littlebit for error search. (dtoverlay=disable-wifi => #dtoverlay=disable-wifi)
    know RS crashes even witout printing.
    Here is the gdb output:

    (gdb) attach 786
    Attaching to process 786
    [New LWP 798]
    [New LWP 807]
    [New LWP 808]
    [New LWP 809]
    [New LWP 810]
    [New LWP 811]
    [New LWP 818]
    [New LWP 819]
    [New LWP 820]
    [New LWP 821]
    [New LWP 822]
    [New LWP 823]
    [New LWP 864]
    [New LWP 1175]
    [New LWP 1238]
    [New LWP 1240]
    [New LWP 1241]
    bt0x00879b0e in __syscall_cp_c ()
    (gdb) bt
    #0  0x00879b0e in __syscall_cp_c ()
    #1  0x008807e6 in __sigtimedwait_time64 ()
    #2  0x00874f70 in sigwait ()
    #3  0x0001d2aa in repetier::RepetierServerApplication::main(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
    #4  0x00336e02 in Poco::Util::Application::run() ()
    #5  0x0001a836 in main ()
    (gdb) thread apply all bt
    Thread 18 (LWP 1241):
    #0  0x00879b0c in __syscall_cp_c ()
    #1  0x00881180 in __timedwait_cp ()
    #2  0x00879f8c in __pthread_cond_timedwait_time64 ()
    #3  0x76b6870c in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    Thread 17 (LWP 1240):
    #0  0x00879b0e in __syscall_cp_c ()
    #1  0x00873a20 in recvfrom ()
    #2  0x00873a02 in recv ()
    #3  0x003031bc in Poco::Net::SocketImpl::receiveBytes(void*, int, int) ()
    #4  0x0030c140 in Poco::Net::WebSocketImpl::receiveNBytes(void*, int) ()
    #5  0x0030c440 in Poco::Net::WebSocketImpl::receiveHeader(char*, bool&) ()
    #6  0x0030c774 in Poco::Net::WebSocketImpl::receiveBytes(Poco::Buffer<char>&, int, Poco::Timespan const&) ()
    #7  0x0030a98c in Poco::Net::WebSocket::receiveFrame(Poco::Buffer<char>&, int&) ()
    #8  0x001d6ec2 in repetier::WebSocketRequestHandler::handleRequest(repetier::RequestContext&) ()
    #9  0x001e5c88 in repetier::MainRequestHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) ()
    #10 0x00312e08 in Poco::Net::HTTPServerConnection::run() ()
    #11 0x003176dc in Poco::Net::TCPServerConnection::start() ()
    #12 0x0030a0ea in Poco::Net::TCPServerDispatcher::run() ()
    #13 0x00817a88 in Poco::PooledThread::run() ()
    #14 0x002b6c5c in Poco::ThreadImpl::runnableEntry(void*) ()
    #15 0x0087a4b0 in start ()
    #16 0x008812e4 in __clone ()
    #17 0x008812e4 in __clone ()
    #18 0x008812e4 in __clone ()
    [...]
    #417 0x008812e4 in __clone ()
    #418 0x008812e4 in __clone ()
    #419 0x008812e4 in __clone ()
    ^CQuit
    (gdb) set logging on
    Copying output to gdb.txt.
    (gdb) quit
    A debugging session is active.
            Inferior 1 [process 786] will be detached.
    Quit anyway? (y or n) n
    Not confirmed.
    (gdb) bt
    #0  0x00879b0e in __syscall_cp_c ()
    #1  0x008807e6 in __sigtimedwait_time64 ()
    #2  0x00874f70 in sigwait ()
    #3  0x0001d2aa in repetier::RepetierServerApplication::main(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
    #4  0x00336e02 in Poco::Util::Application::run() ()
    #5  0x0001a836 in main ()
    (gdb)


  • 2022-03-31 17:23:10: Stopping for signal 2

    Signal 2 is SIGINT like when pressing Ctrl+C, but I also see that linux reboots

    Mar 31 17:23:25 RepetierServer systemd[1]: hostapd.service: Control process exited, code=exited, status=1/FAILURE
    Mar 31 17:23:25 RepetierServer systemd[1]: hostapd.service: Failed with result 'exit-code'.
    Mar 31 17:23:25 RepetierServer systemd[1]: Failed to start Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
    Mar 31 17:23:28 RepetierServer systemd[1]: hostapd.service: Service RestartSec=2s expired, scheduling restart.
    Mar 31 17:23:28 RepetierServer systemd[1]: hostapd.service: Scheduled restart job, restart counter is at 1314.
    Mar 31 17:23:28 RepetierServer systemd[1]: Stopped Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
    Mar 31 17:23:28 RepetierServer systemd[1]: Starting Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator...
    Mar 31 17:23:28 RepetierServer hostapd[14071]: Configuration file: /etc/hostapd/hostapd.conf
    Mar 31 17:23:28 RepetierServer hostapd[14071]: Could not read interface wlan0 flags: No such device
    Mar 31 17:23:28 RepetierServer hostapd[14071]: nl80211: Driver does not support authentication/association or connect commands
    Mar 31 17:23:28 RepetierServer hostapd[14071]: nl80211: deinit ifname=wlan0 disabled_11b_rates=0
    Mar 31 17:23:28 RepetierServer hostapd[14071]: Could not read interface wlan0 flags: No such device
    Mar 31 17:23:28 RepetierServer hostapd[14071]: nl80211 driver initialization failed.
    Mar 31 17:23:28 RepetierServer hostapd[14071]: wlan0: interface state UNINITIALIZED->DISABLED
    Mar 31 17:23:28 RepetierMar 31 17:17:08 RepetierServer systemd-modules-load[113]: Inserted module 'i2c_dev'
    Mar 31 17:17:08 RepetierServer systemd-sysctl[129]: Couldn't write '0' to 's/protected_symlinks', ignoring: No such file or directory
    Mar 31 17:17:08 RepetierServer fake-hwclock[117]: Thu 31 Mar 15:17:01 UTC 2022
    Mar 31 17:17:08 RepetierServer systemd-fsck[136]: e2fsck 1.44.5 (15-Dec-2018)
    Mar 31 17:17:08 RepetierServer systemd-fsck[136]: rootfs: clean, 82620/936000 files, 793775/3822976 blocks
    Mar 31 17:17:08 RepetierServer systemd[1]: Started File System Check on Root Device.


    2022-03-31 20:50:41: Stopping for signal 15

    Reboot of linux.

    Mar 31 20:50:38 RepetierServer systemd[1]: Failed to start Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
    Mar 31 20:50:40 RepetierServer systemd[1]: Unmounting RPC Pipe File System...
    Mar 31 20:50:40 RepetierServer systemd[1]: Condition check resulted in Turns off Raspberry Pi display backlight on shutdown/reboot being skipped.
    Mar 31 20:50:40 RepetierServer systemd[1]: systemd-rfkill.socket: Succeeded.
    Mar 31 20:50:40 RepetierServer systemd[1]: Closed Load/Save RF Kill Switch Status /dev/rfkill Watch.
    Mar 31 20:50:40 RepetierServer systemd[1]: Stopped target Timers.

    What is apparent is the constant try to activate AP point with hostapd - you should in server select in wifi configuration AP never enable so server does not constantly try to start it. If it then has no wifi selected and does not know password it won't start wifi connection at all, just check available networks.

    Logs do not show any hint why it reboots. What does
    last reboot
    Return? I see something like this:
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 22:45 (18840+20:44)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 14:20 (18830+12:20)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 13:52 (18830+11:52)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 16:56 (18801+14:56)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 13:50 (18767+11:50)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 14:16 (18759+12:16)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 14:12 (18759+12:12)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 18:03 (18755+16:03)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 14:44 (18745+12:44)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 09:42 (18745+07:42)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 09:02 (18745+07:02)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 08:49 (18745+06:49)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 08:49 (18745+06:49)
    reboot   system boot  5.10.17-v7+      Thu Jan  1 01:00 - 18:41 (18706+17:41)
    reboot   system boot  5.4.83-v7+       Thu Jan  1 01:00 - 15:53 (18703+14:53)
    reboot   system boot  5.4.83-v7+       Thu Jan  1 01:00 - 17:05 (18652+16:04)

    Question is if it is always system boot or something else that gives a hint.

    Reasons for reboot are also kernel panics, but I'd expect some lines in syslog in that case.

    Is the new system on a new sd card - in case a defect on sd card causes this. These can have many strange results.

    Also what is the pi type and which hardware is connected(display, power, hub, webcams...)? Since it happens frequently you might try without printer connected - you can use virtual cartesian as port to simulate a printer instead. Also it might be good to start without any extra hardware at all and if you think the regular time for a reboot is passt for quite sure, try adding a component at a time and retest. As this is really mysterious as it is a clean reboot without hints on why we need to suspect anything and start from working good towards final config to see at which addition it starts to reboot. Last component/change is then most likely the reason.
  • Nearly forgot the following...:

    (gdb) thread apply all bt
    Thread 18 (LWP 1241):
    #0  0x00879b0c in __syscall_cp_c ()
    #1  0x00881180 in __timedwait_cp ()
    #2  0x00879f8c in __pthread_cond_timedwait_time64 ()
    #3  0x76b6870c in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    Thread 17 (LWP 1240):
    #0  0x00879b0e in __syscall_cp_c ()
    #1  0x00873a20 in recvfrom ()
    #2  0x00873a02 in recv ()
    #3  0x003031bc in Poco::Net::SocketImpl::receiveBytes(void*, int, int) ()
    #4  0x0030c140 in Poco::Net::WebSocketImpl::receiveNBytes(void*, int) ()
    #5  0x0030c440 in Poco::Net::WebSocketImpl::receiveHeader(char*, bool&) ()
    #6  0x0030c774 in Poco::Net::WebSocketImpl::receiveBytes(Poco::Buffer<char>&, int, Poco::Timespan const&) ()
    #7  0x0030a98c in Poco::Net::WebSocket::receiveFrame(Poco::Buffer<char>&, int&) ()
    #8  0x001d6ec2 in repetier::WebSocketRequestHandler::handleRequest(repetier::RequestContext&) ()
    #9  0x001e5c88 in repetier::MainRequestHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) ()
    #10 0x00312e08 in Poco::Net::HTTPServerConnection::run() ()
    #11 0x003176dc in Poco::Net::TCPServerConnection::start() ()
    #12 0x0030a0ea in Poco::Net::TCPServerDispatcher::run() ()
    #13 0x00817a88 in Poco::PooledThread::run() ()
    #14 0x002b6c5c in Poco::ThreadImpl::runnableEntry(void*) ()
    #15 0x0087a4b0 in start ()
    #16 0x008812e4 in __clone ()
    #17 0x008812e4 in __clone ()
    [...]
    #33 0x008812e4 in __clone ()
    #34 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #35 0x008812e4 in __clone ()
    #36 0x008812e4 in __clone ()
    [...]
    #77 0x008812e4 in __clone ()
    #78 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #123 0x008812e4 in __clone ()
    #124 0x008812e4 in __clone ()
    [...]
    #165 0x008812e4 in __clone ()
    #166 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #167 0x008812e4 in __clone ()
    #168 0x008812e4 in __clone ()
    [...]
    #209 0x008812e4 in __clone ()
    #210 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #211 0x008812e4 in __clone ()
    #212 0x008812e4 in __clone ()
    [...]
    #253 0x008812e4 in __clone ()
    #254 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #255 0x008812e4 in __clone ()
    #256 0x008812e4 in __clone ()
    [...]
    #297 0x008812e4 in __clone ()
    #298 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #299 0x008812e4 in __clone ()
    #300 0x008812e4 in __clone ()
    [...]
    #341 0x008812e4 in __clone ()
    #342 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #343 0x008812e4 in __clone ()
    #344 0x008812e4 in __clone ()
    [...]
    #385 0x008812e4 in __clone ()
    #386 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--<RET>
    #387 0x008812e4 in __clone ()
    #388 0x008812e4 in __clone ()
    [...]
    #429 0x008812e4 in __clone ()
    #430 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--q Quit (gdb) quit A debugging session is active. Inferior 1 [process 786] will be detached. Quit anyway? (y or n) y Detaching from program: /usr/local/Repetier-Server/bin/RepetierServer, process 786 [Inferior 1 (process 786) detached]

    I replaced a lot of nearly similar lines with [...]
  • and one more directly after an in-print-shutdown

    (gdb) attach 787
    Attaching to process 787
    [New LWP 799]
    [New LWP 808]
    [New LWP 809]
    [New LWP 810]
    [New LWP 811]
    [New LWP 812]
    [New LWP 816]
    [New LWP 817]
    [New LWP 818]
    [New LWP 819]
    [New LWP 820]
    [New LWP 821]
    [New LWP 829]
    [New LWP 872]
    [New LWP 918]
    [New LWP 1177]
    [New LWP 1240]
    [New LWP 1241]
    [New LWP 1245]
    [New LWP 1259]
    [New LWP 1516]
    [New LWP 1561]
    0x00879b0e in __syscall_cp_c ()
    (gdb) bt
    #0  0x00879b0e in __syscall_cp_c ()
    #1  0x008807e6 in __sigtimedwait_time64 ()
    #2  0x00874f70 in sigwait ()
    #3  0x0001d2aa in repetier::RepetierServerApplication::main(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&) ()
    #4  0x00336e02 in Poco::Util::Application::run() ()
    #5  0x0001a836 in main ()
    (gdb) thread apply all bt

    Thread 23 (LWP 1561):
    #0  0x00879b0c in __syscall_cp_c ()
    #1  0x00881180 in __timedwait_cp ()
    #2  0x00879f8c in __pthread_cond_timedwait_time64 ()
    #3  0x00000000 in ?? ()
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)

    Thread 22 (LWP 1516):
    #0  0x00879b0e in __syscall_cp_c ()
    #1  0x00873a20 in recvfrom ()
    #2  0x00873a02 in recv ()
    #3  0x003031bc in Poco::Net::SocketImpl::receiveBytes(void*, int, int) ()
    #4  0x0030c140 in Poco::Net::WebSocketImpl::receiveNBytes(void*, int) ()
    #5  0x0030c440 in Poco::Net::WebSocketImpl::receiveHeader(char*, bool&) ()
    #6  0x0030c774 in Poco::Net::WebSocketImpl::receiveBytes(Poco::Buffer<char>&, int, Poco::Timespan const&) ()
    #7  0x0030a98c in Poco::Net::WebSocket::receiveFrame(Poco::Buffer<char>&, int&) ()
    #8  0x001cef80 in repetier::WebcamSender::websocketStream(repetier::RequestContext&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) ()
    #9  0x001d13b2 in repetier::PrinterRequestHandler::CameraVideoWebsocket(repetier::RequestContext&) ()
    #10 0x001e5c88 in repetier::MainRequestHandler::handleRequest(Poco::Net::HTTPServerRequest&, Poco::Net::HTTPServerResponse&) ()
    #11 0x00312e08 in Poco::Net::HTTPServerConnection::run() ()
    #12 0x003176dc in Poco::Net::TCPServerConnection::start() ()
    #13 0x0030a0ea in Poco::Net::TCPServerDispatcher::run() ()
    #14 0x00817a88 in Poco::PooledThread::run() ()
    #15 0x002b6c5c in Poco::ThreadImpl::runnableEntry(void*) ()
    #16 0x0087a4b0 in start ()
    #17 0x008812e4 in __clone ()
    #18 0x008812e4 in __clone ()
    #19 0x008812e4 in __clone ()
    #20 0x008812e4 in __clone ()
    #21 0x008812e4 in __clone ()
    #22 0x008812e4 in __clone ()
    #23 0x008812e4 in __clone ()
    #24 0x008812e4 in __clone ()
    #25 0x008812e4 in __clone ()
    #26 0x008812e4 in __clone ()
    #27 0x008812e4 in __clone ()
    #28 0x008812e4 in __clone ()
    #29 0x008812e4 in __clone ()
    #30 0x008812e4 in __clone ()
    #31 0x008812e4 in __clone ()
    #32 0x008812e4 in __clone ()
    #33 0x008812e4 in __clone ()
    #34 0x008812e4 in __clone ()
    --Type <RET> for more, q to quit, c to continue without paging--
  •  > and one more directly after an in-print-shutdown
    You mean you rebooted and then got in debugger? That is useless here. What we need in debugger is attach, continue and wait for server to stop due to an error that would make it crash.

    From past logs it is quite clear that it is no server crash as it writes all the shutdown data messages. So the server does not crash, just gets stopped due to reboot signaled by linux (linux stops apps by signaling them to stop).
  • Maybe I changed to many things at once (Repetier-Server update + Marlin changes + BTT-Smart-Filament-Sensor).
    I ahve now plugged of anythin but the SKR from the Pi, an have now 2 Prints finished with over an hout print-time. - I hope this is not a coincidence...

    Maybe the new cabling made a kind of loose contact on my hardware off push button. I hope this is not a coincidence. 
    But still very strange that it showed effects after such a random time and only while printing.

    I a few successful prints I will reactivate the Filament-Sensor and see if the error reappears.

    But if it was realy only a loose contact, I have to appologize for the false alert and thank for your help to find the error!



  • > my hardware off push button
    What did you add? Something triggering reboot/shutdown? I saw some electronics that would do that to prevent unplugging - really bad thing with pis to need to power off for restart. But since we search a reboot trigger that sounds like it. So maybe check a bit more without it and then add it again.
  • I added the switch long before you added the option in touch interface. And now there is a hole in the front panel of my printer, so i left it in action.
    I realised it this way (used a tutorial for it):

    File named "shutdown_button.py" wit following content:
    #! /usr/bin/env python
    import os
    import RPi.GPIO as GPIO
    GPIO.setmode(GPIO.BCM)
    # GPIO3 (pin 5) set up as input. It is pulled up to stop false signals
    GPIO.setup(26, GPIO.IN, pull_up_down=GPIO.PUD_UP)
    try:
        while True:
            # wait for the pin to be sorted with GND and, if so, halt the system
            GPIO.wait_for_edge(26, GPIO.FALLING)
            # shut down the rpi
            os.system("/sbin/shutdown -h now")
    except:
        GPIO.cleanup()
    linked in crontab:

    @reboot root /usr/bin/python /home/pi/shutdown_button.py
    Regarding the title of this thred: "Server crashes / shuts-down randomly (and offers wrong rescue-point)"
    Would you expect the rescue-point behaviour i mentioned in my opening post with my possibly self-constructed shutdown-error?

    It would be a really nice feature, if it was a bit more robust or offered a few more options.
    For example to measure the height of what is on the printing plate an then to correct the restarting height. (And may be to enter a precentage value of the layer completition of that height. - a rendered preview of the restaring point in the selected layer would possibly be overkill...)


  • > Would you expect the rescue-point behaviour i mentioned in my opening post with my possibly self-constructed shutdown-error?
    No, I only thought of rescue point the result of a shutdown as that is what it was made for.

    If you have the gcode in models you could in deed check heights and compare with proposed start position. The proposed is normally very close to the real position since we log all moves send to printer. Just if linux did not flush the last block the last x commands might be missed.

    Selecting the start position in an interactive 3d preview is a very good idea though. I think I will add this for pro version as we are adding 3d preview during print for pro anyway. So same function and seeing that live comparing with print is maybe the most comfortable way to decide where to continue. Maybe I even add a point to start printing from given point even without resuce to save prints for users without rescue enabled. Will see.
Sign In or Register to comment.