Server crashes / shuts-down randomly (and offers wrong rescue-point)
Hello,
I am using Repetier Server Pro 1.3.0 on Raspberry Pi 3B with a printer running Marlin 2.0.9.3 on SKR 1.4T
Until my last Repetier-Server update I had no server problems. But now i keeps randomly crashing.
The error expresses as a blank-white Touchscreen (installed on GPIO-Pins). That is normally only white, after shutting the server down manually. The server is then also not reachable via network interface. The red-LED on the Pi is steady on the green one steady off. - it looks like an regular shut down - but in the middle of a print by itself.
The SKR1.4 Board behaves like it doesn't recognise the absence of the server: its standing still after last command and keeps heating bed and extruder.
At least once it seemed as the server crashed imideately after opening the network interface on my phone, since i saw one last move on the "steuerung"-screen and then the connection-lost error.
The print-logs offer no explanation - seme like stopped in the middle of the print.
https://www.dropbox.com/sh/algs0cy4d13kyo0/AACui_i2zhszGVSs48AH9cH3a?dl=0
It there Something like an Repetier-Server-Log wich i can check for errors? Cant find it.
Also to mention:
with the last Repetier Server Update, i also updated Marlin and activated BTT-smart-Filament-Sensor (connected to SKR) and so had to activate HOST_PAUSE_M76 and HOST_PROMT_SUPPORT. Both seemed to work flawlessly - but of course could have a connection to the server-Shutdowns.
---------------------------------
Dont know if this error has the same cause:
After Server-crash/shutdown and after reboot it offers me the option to rescue the print.
That sadly works only sometimes (about every second to third try). Sometimes it heates after starting the rescue but doesnt do anything else after reaching the temperatures.
And sometimes it offers obviously a wrong restarting height. Specialy if i allready rescued the print (e.g at 7mm) and it crashed again (at 13mm) it offers me a rescue restart at the old height (again at 7mm) but no option to chose or change the heigt.
I am using Repetier Server Pro 1.3.0 on Raspberry Pi 3B with a printer running Marlin 2.0.9.3 on SKR 1.4T
Until my last Repetier-Server update I had no server problems. But now i keeps randomly crashing.
The error expresses as a blank-white Touchscreen (installed on GPIO-Pins). That is normally only white, after shutting the server down manually. The server is then also not reachable via network interface. The red-LED on the Pi is steady on the green one steady off. - it looks like an regular shut down - but in the middle of a print by itself.
The SKR1.4 Board behaves like it doesn't recognise the absence of the server: its standing still after last command and keeps heating bed and extruder.
At least once it seemed as the server crashed imideately after opening the network interface on my phone, since i saw one last move on the "steuerung"-screen and then the connection-lost error.
The print-logs offer no explanation - seme like stopped in the middle of the print.
https://www.dropbox.com/sh/algs0cy4d13kyo0/AACui_i2zhszGVSs48AH9cH3a?dl=0
It there Something like an Repetier-Server-Log wich i can check for errors? Cant find it.
Also to mention:
with the last Repetier Server Update, i also updated Marlin and activated BTT-smart-Filament-Sensor (connected to SKR) and so had to activate HOST_PAUSE_M76 and HOST_PROMT_SUPPORT. Both seemed to work flawlessly - but of course could have a connection to the server-Shutdowns.
---------------------------------
Dont know if this error has the same cause:
After Server-crash/shutdown and after reboot it offers me the option to rescue the print.
That sadly works only sometimes (about every second to third try). Sometimes it heates after starting the rescue but doesnt do anything else after reaching the temperatures.
And sometimes it offers obviously a wrong restarting height. Specialy if i allready rescued the print (e.g at 7mm) and it crashed again (at 13mm) it offers me a rescue restart at the old height (again at 7mm) but no option to chose or change the heigt.
Comments
Also of interest is always syslog downloadable at same position. Check the time of crash to see if anything is happening.
Did you start using MQTT? It is currently the only new feature I know can crash server if mqtt server is too slow to receive data.
If it is not mqtt I'd be grateful if you can send a full traceback at the moment of the crash as described here:
https://www.repetier-server.com/knowledgebase/debugging-crashes-hangs-on-linux/
On de domain you also find german entry:-)
That would allow me seeing exactly at which part of the code the crash happens (assuming it is one).
I just added the (complete) server.log in the Dropbox.
I am not sure when the printer shut down, but I would guess the log file of the print-logs have been written until short before it.
So the last two crashes would have been at
24.03.2022 17:28:41 and
24.03.2022 23:24:00
Here the interesting parts:
At least at 23:24:00 it fits perfectly the time-stamp of the las Print-log entry.
For most of the entrys I cannot say if they should be there.
entries that i would consider suspicious are:
Since I am absolutely no friend of the Linux console, would wait for your next answer before doing the steps described in your link above.
The good thing is that it is no real crash.
That means linux did send SIGTERM to server forcing it to shut down. From server side it seems to come from no where - no log entry close to that time stamp and as you see server is shut down cleanly and even logs it.
First you should check /var/log/syslog at the timestamp to see what linux says. Was it triggered from linux or an app.
It might also come from server it self if some uncatched error happened. That would require the debug solution I described above to see where it stopped. If it comes from server it is from an uncatched exception and we would see it there.
Other reasons would be if linux wanted to reboot or if linux runs out of resources it would also kill some softwares to get enough memory - therefor the syslog check.
in short i did the following in ssh:
then I started a print - wich stopped by itself Print Log: Server-Log Sys-Log:
In the moment the server shut down, putty lost connection. So I had some problems following the advice "It is important to keep the console open." After server restart I did following steps: which resulted in: then I tried: again. wich led to: seems like ne Process-number is changing. so i looked again with "ps aux" and found it to bee 1149 now... so i did following again: and: What did I do wrong? Looks not like "a very long list with infos about all thread."... Where do I find the "gdb.txt" and how can i download / open it to copy its content via ssh?
Did you check syslog as same time already?
Process id changed since server was restarted.
The Repetier server image should be "Repetier-Server-Image_0_85_2_v7".
Since I had a some problems getting my webcam and touchscreen to work properly, once it was set up, i saw no reason to refresh the image.
What are my next steps?
From syslog I see
meaning linux executed reboot command. So that explains why server shuts down - it should and does on reboot so that is good.
There is a reboot function in main menu. Did you call this? I mean it has a security question so should not be an error to trigger it. Otherwise do you have any g-code script triggering reboot that might trigger here? Log does not contain anything why it reboots. Also in server log I see from you I see no reason. I'm pretty sure it would normally write "execute ...." but not 100% sure if that manages to write in reboot case.
I also see your linux has ModemManager installed. This makes problem connecting to printer the first 20 seconds. You should better uninstall it.
sudo apt-get remove ModemManager
should work here.
So now we know it is not a bug next thing is to find who and why linux reboots. But I see no reason in logs so hope you have an idea as you configured the system. By default only reboot way is the main menu reboot function, but you would notice if you call it your self.
i started now over again with the new SD-image File "Repetier-Server-Image_1_3_0_v28.img"
I copied no configuration-file from the old SD, but typed everything manually again
The error is still present and gdb is - at least for me - as usefull as before.
I Added all logs i could get. sadly no print.log was written, so I cannot say when a unwanted shutdown happened. I will try again tomorrow.
https://www.dropbox.com/sh/camkhoqum1dqnco/AADVEc-coLbmHUyNa_V0MD63a?dl=0
(new Dropbox folder for new images logs)
Can there be any communication between SKR1.4 and Raspberry bypassing Repetier-Server, wich is telling the Pi to shut down? As far as i remember the Pi never shut down without a running print job by itself.
I recognized "Stopping for signal 2" has joined "Stopping for signal 15" messages - its getting worse...:(
(I have deactivated WIFI with these instructions, since the device is operated stationary by cable:
https://www.laub-home.de/wiki/Raspberry_Pi_Wifi_deaktivieren
unfortunately Raspbian does not seem to get along with it and spams the whole sys.log with error messages.)
I reactivated wifi to clean up sys-log a littlebit for error search. (dtoverlay=disable-wifi => #dtoverlay=disable-wifi)
know RS crashes even witout printing.
Here is the gdb output:
Signal 2 is SIGINT like when pressing Ctrl+C, but I also see that linux reboots
2022-03-31 20:50:41: Stopping for signal 15
Reboot of linux.
Mar 31 20:50:38 RepetierServer systemd[1]: Failed to start Advanced IEEE 802.11 AP and IEEE 802.1X/WPA/WPA2/EAP Authenticator.
What is apparent is the constant try to activate AP point with hostapd - you should in server select in wifi configuration AP never enable so server does not constantly try to start it. If it then has no wifi selected and does not know password it won't start wifi connection at all, just check available networks.
Logs do not show any hint why it reboots. What does
Return? I see something like this:
Question is if it is always system boot or something else that gives a hint.
Reasons for reboot are also kernel panics, but I'd expect some lines in syslog in that case.
Is the new system on a new sd card - in case a defect on sd card causes this. These can have many strange results.
Also what is the pi type and which hardware is connected(display, power, hub, webcams...)? Since it happens frequently you might try without printer connected - you can use virtual cartesian as port to simulate a printer instead. Also it might be good to start without any extra hardware at all and if you think the regular time for a reboot is passt for quite sure, try adding a component at a time and retest. As this is really mysterious as it is a clean reboot without hints on why we need to suspect anything and start from working good towards final config to see at which addition it starts to reboot. Last component/change is then most likely the reason.
I replaced a lot of nearly similar lines with [...]
You mean you rebooted and then got in debugger? That is useless here. What we need in debugger is attach, continue and wait for server to stop due to an error that would make it crash.
From past logs it is quite clear that it is no server crash as it writes all the shutdown data messages. So the server does not crash, just gets stopped due to reboot signaled by linux (linux stops apps by signaling them to stop).
I ahve now plugged of anythin but the SKR from the Pi, an have now 2 Prints finished with over an hout print-time. - I hope this is not a coincidence...
Maybe the new cabling made a kind of loose contact on my hardware off push button. I hope this is not a coincidence.
But still very strange that it showed effects after such a random time and only while printing.
I a few successful prints I will reactivate the Filament-Sensor and see if the error reappears.
But if it was realy only a loose contact, I have to appologize for the false alert and thank for your help to find the error!
What did you add? Something triggering reboot/shutdown? I saw some electronics that would do that to prevent unplugging - really bad thing with pis to need to power off for restart. But since we search a reboot trigger that sounds like it. So maybe check a bit more without it and then add it again.
I realised it this way (used a tutorial for it):
File named "shutdown_button.py" wit following content:
Regarding the title of this thred: "Server crashes / shuts-down randomly (and offers wrong rescue-point)"
Would you expect the rescue-point behaviour i mentioned in my opening post with my possibly self-constructed shutdown-error?
It would be a really nice feature, if it was a bit more robust or offered a few more options.
For example to measure the height of what is on the printing plate an then to correct the restarting height. (And may be to enter a precentage value of the layer completition of that height. - a rendered preview of the restaring point in the selected layer would possibly be overkill...)
No, I only thought of rescue point the result of a shutdown as that is what it was made for.
If you have the gcode in models you could in deed check heights and compare with proposed start position. The proposed is normally very close to the real position since we log all moves send to printer. Just if linux did not flush the last block the last x commands might be missed.
Selecting the start position in an interactive 3d preview is a very good idea though. I think I will add this for pro version as we are adding 3d preview during print for pro anyway. So same function and seeing that live comparing with print is maybe the most comfortable way to decide where to continue. Maybe I even add a point to start printing from given point even without resuce to save prints for users without rescue enabled. Will see.