Uncertain if Bug or Covid-19
Hi guys,
Been using the repetier-server for roughly 3 months. Occasionally, every 2 to 3 weeks or so, it decides to ruin my day and halt ongoing prints. This time, I managed to catch some logs and also did some digging in the forum - and I think it is beyond my skill level to handle (silly lil cosplayer - I know more about Anime than about Raspberroes).
I'm running Repetier-Server 1.3.0 on a Raspberry pi 3B (1GB mem), connect to the pi original power supply (which is connected to a UPS batter).
I have two printers attached to the pi - Pulse X (by mattercontrol, just a dumb lil' varient of MK2 printers with a Rambo board. HA! Cosplaygirl nailed it! I hope <span></span> ) and Prusa mk3s.
I noticed that everytime it failed - it always happened when Prusa was already in mid-print, and I started a new print with the Pulse. It once happened when I just connected to USB of the Pulse (after calibrating it), but mostly it happens when I just upload an STL file and start a Pulse print.
I don't recall it halting when handling Prusa prints. However, for Prusa, I print directly from the Prusa Slicer (not uploading any STL - but had configured the slicer to communicate directly). By the way, I wish I could do the same with Pulse - but. yeah. that. Nope.
LOGS! I got logs!! and I went through the effort of trying to make sense of them. I noticed these threads (but I'm not sure I can find a solution through them to be honest):
https://forum.repetier.com/discussion/7041/printer-suddenly-stops-mid-print
https://forum.repetier.com/discussion/5865/out-of-memory-crash-repetier-server-on-raspberry-pi
There was another post I looked through (about mid-print stopping) from 2022 I think.
All I can say is - that from my understanding - repetier-server crashes, and maybe even experiences a "memory leak" (I was actually taking this from the 2nd link about - they seem to have the same error I get in the logs).
To make my post decent and clean, I'll share the logs in the first comment.
And if I made a mistake in the forum (and posted it wrongfully in "general") I apologize.
Been using the repetier-server for roughly 3 months. Occasionally, every 2 to 3 weeks or so, it decides to ruin my day and halt ongoing prints. This time, I managed to catch some logs and also did some digging in the forum - and I think it is beyond my skill level to handle (silly lil cosplayer - I know more about Anime than about Raspberroes).
I'm running Repetier-Server 1.3.0 on a Raspberry pi 3B (1GB mem), connect to the pi original power supply (which is connected to a UPS batter).
I have two printers attached to the pi - Pulse X (by mattercontrol, just a dumb lil' varient of MK2 printers with a Rambo board. HA! Cosplaygirl nailed it! I hope <span></span> ) and Prusa mk3s.
I noticed that everytime it failed - it always happened when Prusa was already in mid-print, and I started a new print with the Pulse. It once happened when I just connected to USB of the Pulse (after calibrating it), but mostly it happens when I just upload an STL file and start a Pulse print.
I don't recall it halting when handling Prusa prints. However, for Prusa, I print directly from the Prusa Slicer (not uploading any STL - but had configured the slicer to communicate directly). By the way, I wish I could do the same with Pulse - but. yeah. that. Nope.
LOGS! I got logs!! and I went through the effort of trying to make sense of them. I noticed these threads (but I'm not sure I can find a solution through them to be honest):
https://forum.repetier.com/discussion/7041/printer-suddenly-stops-mid-print
https://forum.repetier.com/discussion/5865/out-of-memory-crash-repetier-server-on-raspberry-pi
There was another post I looked through (about mid-print stopping) from 2022 I think.
All I can say is - that from my understanding - repetier-server crashes, and maybe even experiences a "memory leak" (I was actually taking this from the 2nd link about - they seem to have the same error I get in the logs).
To make my post decent and clean, I'll share the logs in the first comment.
And if I made a mistake in the forum (and posted it wrongfully in "general") I apologize.
Comments
This crash happened EXACTLY after uploading a file to Pulse X and asking to print it. So at 16:55 the fun actaully starts I think.
repetier-server log file
Syslog Log
Daemon Log File
Looking into syslog
oom-kill is a linux process that kills an application if linux is running out of memory. It choosed server as it was the biggest consumer at that moment. Actually the size was more that it normally should have.
If you login over ssh on your pi you can run
htop
top see memory usage. For 1GB it should show 2-3% normally if you have not a lot of files and printers connected.
This is most likely independent of prusa printing. Actually it is not clear what it is. If you say it happens frequently when you upload a file to that printer, please watch memory usage while you upload a test file. If that in deed happens then, please send me a sample g-code so I can analyse why that happens.
You might also try updating to 1.4.1 using autoupdater. It will update all gcodes so it will be busy a while after update. But there we have at least fixed all known bugs so far.
1) I have uploaded two htop screenshots taken at few seconds difference - to try and capture some of the dynamics. I noticed most things do stay the same over 1-2 minutes of looking at it (not sure that was enough time though. Not sure if I should also calibrate htop somehow to show different outcome. Thats how I use htop when me and my roommate are struggling with openwrt - figured thats enough data).
2) Both printers are actually currently printing (after the earlier failure, I didn't even restart the raspberry, just carried on normally and started new prints). So what we see now is just the same session from when I took the logs, and currently both printers are running for 2-3 hours already (which is the normal day-to-day activity here).
3) You are correct. I sometimes remember how newbish I sound (or "read") when saying I "uploaded STL file" instead of "G-Code file". Always clicking on either "EXPORT G-CODE" or "UPLOAD G-CODE" should have registered somewhere So yeah, of course G-CODE.
4) Your comment about the G-CODE line (restarting after analyzing it). I think there is a pattern, but I am not completely sure about it - that the crashing happen when a file is uploaded to Pulse. I know of at least once that it happened exactly the same. Other times I either don't remember the exact scenario (or I had it reported to me by my roomie).
Lets assume it happens when a file is uploaded - Although I think I see a pattern (and I will follow on and provide samples as asked) of it happening this way - I dont think it ever happened with Prusa. Remember that on Prusa I push a print job with the Slicer - click and go. No manually web browsing and uploading a MatterControl sliced and exported G-CODE.
So I think that at the moment, I may suggest isolating the situation to uploading via a web browser. Not sure if it is also related to the type of printer - I would assume it isn't as I didn't see any faulty USB/Serial/Printer messages in between the crash and uploading (as you pointed out).
5) Regarding monitoring. I know my way around raspberries but only enough to be helpful when guided. Is there anyway I could 24/7 monitor specific values (such as memory consumption, processes, etc) and keep that in a log file? I'd prefer setting a continuous logging feature that upon crashing - I'd be able to freeze and extract. If you can suggest a strategy for that, I'd love to implement it and keep as much data as possible.
6) Finally - updating. My favorite part of any modern technology. Will do! So once the current prints are done, I'll go ahead and update. Should it happen again, I hope you could suggest how best to monitor which values for the next crash.
Just in case, I will also attach the specific G-CODE that was uplaoded (though I'm pretty sure every crash had different G-CODE to it. I rarely reuse G-CODES. Hi, I hardly keep STLs!).
HTOP1: https://i.postimg.cc/fW85gjvJ/repHTOP1.png
HTOP2: https://i.postimg.cc/52dzP6DX/repHTOP2.png
For logging memory I don't know a good tool, but you could log every minute the server memory usage. Then you also see if it goes high immediately or slowly. So login as pi
nano /home/pi/logServerMemory
Write following content:
date >> /home/pi/memory.log
Save and then make it executable:
chmod 755 /home/pi/logServerMemory
Now add it to cron so it gets executed every minute:
crontab -e
Select nano as editor. Add this line and save:
* * * * * /home/pi/logServerMemory
This will create a log /home/pi/memory.log
NOTE: It will grow for ever, so from time to time just delete it if nothing happened or at some time disk is full.
To see last 200 lines run
tail -200 /home/pi/memory.log
The 4th column is memory usage, so will be around 1.9 in your case. When it is above 5 I'd assume the problem kicked in and you need to check when it started increasing and remember what you did t that point.
Hope this help anyone in the future, and thanks @Repetier for the assistance with this!
I'll report back in a few days just in case
1) I crontab'ed the script as you suggested (and also added a 1MB limit on it for archiving in a secondary file so I could always delete it and still have the monitor running with latest events).
2) I have attached some values here from the monitor, perhaps it would make more sense to you - but I constantly see the 1.7 (lower than 1.9 even) as the usage up to a point where it a "bit" higher - 2.3+/-. All the same doesn't seem to be an issue here.
3) Yesterday around 13:00 (25th July) I also updated to 1.4.1. Attached are some of the monitor logs before and after the update. I attached some data from the logs (I think the change in memory, again, minor as it is, was a result of the update. I manually reboot the pi post the update).
If there will be any more issues with failures (I'll try to recreate the situation where I upload a G-CODE to the Pulse while Prusa is effortlessly about the finish a 7 hours complex print and try to see if anything still happens).
Thanks again for the efforts, and may this be fruitful for anyone stumbling something familiar in the future so we dont end like *THIS* guy!
If you have access to it before it crashes due to memory usage also try to provide
ps aux
output. One case where this can in deed happen is if you have a websocket open and a request requires strating an external app that never returns. That blocks the socket and we add data in cache to deliver after request is finished. I'm already thinking about a strategy to at least stop adding data to such processes but since they are busy it is quite difficult.