Out of memory crash repetier-server on raspberry pi
Hello,
Repetier-server 0.90.7 on raspberry pi 3 (Raspian 9)
During a print, repetier-server was killed by the system and restarted with a "Out of memory: Kill process 680 (RepetierServer) score 867 or sacrifice child" log.
Logs :
/var/lib/Repetier-Server/logs/server.log
Repetier-server 0.90.7 on raspberry pi 3 (Raspian 9)
During a print, repetier-server was killed by the system and restarted with a "Out of memory: Kill process 680 (RepetierServer) score 867 or sacrifice child" log.
Logs :
/var/lib/Repetier-Server/logs/server.log
2018-10-23 18:07:52: start printjob 34505_manche +4 on printer HD
2018-10-23 18:07:53: LUA job rendering.render#{"id":8,"image":"/var/lib/Repetier-Server/printer/HD/jobs/00000008_34505_manche +4_l.png","layer":"/var/lib/Repetier-Server/printer/HD/jobs/00000008_34505_manche +4.layer","list":"jobs","slug":"HD"}
2018-10-23 18:09:15: Reading configuration file ...
Input file: /var/lib/Repetier-Server/printer/HD/jobs/00000008_34505_manche +4.layer
Rendering shadows ...10.6502 seconds.
Rendering gcode ...54.7037 seconds.
Number of end triangles: 275352
Number of normal triangles: 9381918
Rendering background ...
Time for raytracing: 15.0837 seconds.
images stored.
Computation time: 82.6972 seconds
2018-10-24 00:14:25: error:WifiManager::scanMode:System exception
2018-10-24 00:16:51: error:WifiManager::scanMode:System exception
2018-10-24 00:16:54: error:WifiManager::scanNetworks:System exception
2018-10-24 00:17:58: error:WifiManager::scanMode:System exception
2018-10-24 00:17:59: error:WifiManager::scanNetworks:System exception
2018-10-24 00:19:01: error:WifiManager::scanMode:System exception
2018-10-24 00:19:01: error:WifiManager::scanNetworks:System exception
2018-10-24 00:20:03: error:WifiManager::scanMode:System exception
2018-10-24 00:20:04: error:WifiManager::scanNetworks:System exception
2018-10-24 00:21:07: error:WifiManager::scanMode:System exception
2018-10-24 00:21:08: error:WifiManager::scanNetworks:System exception
2018-10-24 00:22:11: error:WifiManager::scanMode:System exception
2018-10-24 00:22:13: error:WifiManager::scanNetworks:System exception
2018-10-24 00:23:18: error:WifiManager::scanMode:System exception
2018-10-24 00:23:22: error:WifiManager::scanNetworks:System exception
2018-10-24 00:24:31: error:WifiManager::scanMode:System exception
2018-10-24 00:24:34: error:WifiManager::scanNetworks:System exception
2018-10-24 00:25:49: error:WifiManager::scanMode:System exception
2018-10-24 00:26:03: error:WifiManager::scanNetworks:System exception
2018-10-24 00:26:28: Start logging...
2018-10-24 00:26:28: Imported external command Shutdown Server
2018-10-24 00:26:28: Imported external command Reboot Server
.....
/var/log/syslog
/var/log/syslog
Oct 24 00:17:16 Repetier-Server CRON[8933]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094270] usbhid-ups invoked oom-killer: gfp_mask=0x14000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094285] usbhid-ups cpuset=/ mems_allowed=0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094300] CPU: 2 PID: 814 Comm: usbhid-ups Tainted: G C 4.14.77-v7+ #1154
Oct 24 00:26:26 Repetier-Server kernel: [102028.094303] Hardware name: BCM2835
Oct 24 00:26:26 Repetier-Server kernel: [102028.094324] [<8010ff30>] (unwind_backtrace) from [<8010c174>] (show_stack+0x20/0x24)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094334] [<8010c174>] (show_stack) from [<80788e64>] (dump_stack+0xd4/0x118)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094346] [<80788e64>] (dump_stack) from [<802249e0>] (dump_header+0xac/0x208)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094356] [<802249e0>] (dump_header) from [<80223d48>] (oom_kill_process+0x478/0x584)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094364] [<80223d48>] (oom_kill_process) from [<802246a8>] (out_of_memory+0x124/0x334)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094374] [<802246a8>] (out_of_memory) from [<8022a180>] (__alloc_pages_nodemask+0x1060/0x11d8)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094383] [<8022a180>] (__alloc_pages_nodemask) from [<8022a318>] (__get_free_pages+0x20/0x6c)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094393] [<8022a318>] (__get_free_pages) from [<805c2be0>] (usbdev_ioctl+0xfb0/0x1ac8)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094404] [<805c2be0>] (usbdev_ioctl) from [<8029eab0>] (do_vfs_ioctl+0xac/0x7c4)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094412] [<8029eab0>] (do_vfs_ioctl) from [<8029f20c>] (SyS_ioctl+0x44/0x6c)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094421] [<8029f20c>] (SyS_ioctl) from [<80108000>] (ret_fast_syscall+0x0/0x28)
Oct 24 00:26:26 Repetier-Server kernel: [102028.094424] Mem-Info:
Oct 24 00:26:26 Repetier-Server kernel: [102028.094436] active_anon:107744 inactive_anon:107805 isolated_anon:0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094436] active_file:250 inactive_file:294 isolated_file:0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094436] unevictable:0 dirty:0 writeback:0 unstable:0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094436] slab_reclaimable:1824 slab_unreclaimable:5291
Oct 24 00:26:26 Repetier-Server kernel: [102028.094436] mapped:1421 shmem:2585 pagetables:955 bounce:0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094436] free:5169 free_pcp:2 free_cma:1271
Oct 24 00:26:26 Repetier-Server kernel: [102028.094444] Node 0 active_anon:430976kB inactive_anon:431220kB active_file:1000kB inactive_file:1176kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:5684kB dirty:0kB writeback:0kB shmem:10340kB writeback_tmp:0kB unstable:0kB all_unreclaimable? yes
Oct 24 00:26:26 Repetier-Server kernel: [102028.094456] Normal free:20676kB min:16384kB low:20480kB high:24576kB active_anon:430976kB inactive_anon:431220kB active_file:1040kB inactive_file:1396kB unevictable:0kB writepending:0kB present:970752kB managed:949448kB mlocked:0kB kernel_stack:1208kB pagetables:3820kB bounce:0kB free_pcp:8kB local_pcp:0kB free_cma:5084kB
Oct 24 00:26:26 Repetier-Server kernel: [102028.094458] lowmem_reserve[]: 0 0
Oct 24 00:26:26 Repetier-Server kernel: [102028.094466] Normal: 1500*4kB (UMEC) 294*8kB (UMEC) 103*16kB (UMEC) 26*32kB (EC) 12*64kB (UEC) 8*128kB (UEC) 1*256kB (U) 0*512kB 2*1024kB (UE) 1*2048kB (U) 1*4096kB (C) = 21072kB
Oct 24 00:26:26 Repetier-Server kernel: [102028.094513] 3410 total pagecache pages
Oct 24 00:26:26 Repetier-Server kernel: [102028.094517] 234 pages in swap cache
Oct 24 00:26:26 Repetier-Server kernel: [102028.094521] Swap cache stats: add 27158, delete 26924, find 1712/3144
Oct 24 00:26:26 Repetier-Server kernel: [102028.094523] Free swap = 0kB
Oct 24 00:26:26 Repetier-Server kernel: [102028.094525] Total swap = 102396kB
Oct 24 00:26:26 Repetier-Server kernel: [102028.094529] 242688 pages RAM
Oct 24 00:26:26 Repetier-Server kernel: [102028.094531] 0 pages HighMem/MovableOnly
Oct 24 00:26:26 Repetier-Server kernel: [102028.094533] 5326 pages reserved
Oct 24 00:26:26 Repetier-Server kernel: [102028.094536] 2048 pages cma reserved
Oct 24 00:26:26 Repetier-Server kernel: [102028.094539] [ pid ] uid tgid total_vm rss nr_ptes nr_pmds swapents oom_score_adj name
Oct 24 00:26:26 Repetier-Server kernel: [102028.094558] [ 90] 0 90 8779 1229 15 0 59 0 systemd-journal
Oct 24 00:26:26 Repetier-Server kernel: [102028.094564] [ 128] 0 128 3703 30 9 0 159 -1000 systemd-udevd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094571] [ 278] 100 278 4320 14 9 0 95 0 systemd-timesyn
Oct 24 00:26:26 Repetier-Server kernel: [102028.094577] [ 322] 0 322 1324 16 7 0 38 0 cron
Oct 24 00:26:26 Repetier-Server kernel: [102028.094583] [ 327] 0 327 13255 0 18 0 349 0 ModemManager
Oct 24 00:26:26 Repetier-Server kernel: [102028.094589] [ 330] 65534 330 1324 4 6 0 56 0 thd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094596] [ 333] 108 333 1634 47 7 0 68 0 avahi-daemon
Oct 24 00:26:26 Repetier-Server kernel: [102028.094601] [ 336] 0 336 5937 19 9 0 144 0 rsyslogd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094608] [ 342] 105 342 1659 80 7 0 74 -900 dbus-daemon
Oct 24 00:26:26 Repetier-Server kernel: [102028.094613] [ 359] 108 359 1601 5 5 0 72 0 avahi-daemon
Oct 24 00:26:26 Repetier-Server kernel: [102028.094620] [ 371] 0 371 25545 232 26 0 284 0 NetworkManager
Oct 24 00:26:26 Repetier-Server kernel: [102028.094625] [ 373] 0 373 1728 20 8 0 70 0 systemd-logind
Oct 24 00:26:26 Repetier-Server kernel: [102028.094631] [ 389] 0 389 9415 1 13 0 153 0 polkitd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094637] [ 429] 0 429 2418 59 8 0 88 0 wpa_supplicant
Oct 24 00:26:26 Repetier-Server kernel: [102028.094642] [ 430] 0 430 1661 2 8 0 179 0 dhclient
Oct 24 00:26:26 Repetier-Server kernel: [102028.094648] [ 448] 0 448 524 0 5 0 34 0 hciattach
Oct 24 00:26:26 Repetier-Server kernel: [102028.094655] [ 453] 0 453 1818 0 8 0 84 0 bluetoothd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094661] [ 613] 0 613 721 12 5 0 73 0 dhcpcd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094666] [ 635] 0 635 1049 0 7 0 35 0 agetty
Oct 24 00:26:26 Repetier-Server kernel: [102028.094673] [ 649] 0 649 2552 0 8 0 138 -1000 sshd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094678] [ 675] 0 675 11763 6 14 0 266 0 nginx
Oct 24 00:26:26 Repetier-Server kernel: [102028.094684] [ 676] 33 676 11797 2 15 0 308 0 nginx
Oct 24 00:26:26 Repetier-Server kernel: [102028.094690] [ 677] 33 677 11797 18 15 0 297 0 nginx
Oct 24 00:26:26 Repetier-Server kernel: [102028.094696] [ 678] 33 678 11797 18 15 0 297 0 nginx
Oct 24 00:26:26 Repetier-Server kernel: [102028.094701] [ 679] 33 679 11797 22 15 0 297 0 nginx
Oct 24 00:26:26 Repetier-Server kernel: [102028.094707] [ 680] 109 680 296805 211632 498 0 16085 0 RepetierServer
Oct 24 00:26:26 Repetier-Server kernel: [102028.094713] [ 681] 0 681 1266 0 6 0 55 0 repetier_ui.sh
Oct 24 00:26:26 Repetier-Server kernel: [102028.094719] [ 682] 0 682 2082 2 9 0 92 0 sudo
Oct 24 00:26:26 Repetier-Server kernel: [102028.094724] [ 686] 0 686 5964 27 12 0 1140 0 python
Oct 24 00:26:26 Repetier-Server kernel: [102028.094730] [ 687] 0 687 6620 51 16 0 296 0 nmbd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094736] [ 713] 0 713 10164 89 24 0 411 0 smbd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094742] [ 714] 0 714 9663 44 22 0 417 0 smbd-notifyd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094748] [ 715] 0 715 9661 43 21 0 417 0 cleanupd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094754] [ 717] 0 717 10166 84 24 0 397 0 lpqd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094760] [ 814] 111 814 668 25 6 0 41 0 usbhid-ups
Oct 24 00:26:26 Repetier-Server kernel: [102028.094766] [ 824] 111 824 1246 17 7 0 58 0 upsd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094772] [ 1402] 65534 1402 2308 22 9 0 71 0 dnsmasq
Oct 24 00:26:26 Repetier-Server kernel: [102028.094778] [28113] 0 28113 3761 69 12 0 175 0 cupsd
Oct 24 00:26:26 Repetier-Server kernel: [102028.094784] [28114] 0 28114 8824 0 16 0 242 0 cups-browsed
Oct 24 00:26:26 Repetier-Server kernel: [102028.094790] Out of memory: Kill process 680 (RepetierServer) score 867 or sacrifice child
Oct 24 00:26:26 Repetier-Server kernel: [102028.094868] Killed process 680 (RepetierServer) total-vm:1187220kB, anon-rss:846528kB, file-rss:0kB, shmem-rss:0kB
Oct 24 00:26:26 Repetier-Server kernel: [102028.333858] oom_reaper: reaped process 680 (RepetierServer), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Oct 24 00:26:26 Repetier-Server systemd[1]: RepetierServer.service: Main process exited, code=killed, status=9/KILL
Oct 24 00:26:26 Repetier-Server systemd[1]: RepetierServer.service: Unit entered failed state.
Oct 24 00:26:26 Repetier-Server systemd[1]: RepetierServer.service: Failed with result 'signal'.
Oct 24 00:26:26 Repetier-Server systemd[1]: RepetierServer.service: Service has no hold-off time, scheduling restart.
Oct 24 00:26:26 Repetier-Server systemd[1]: Stopped Repetier-Server 3D Printer Server.
Oct 24 00:26:26 Repetier-Server systemd[1]: Starting Repetier-Server 3D Printer Server...
Oct 24 00:26:28 Repetier-Server systemd[1]: Started Repetier-Server 3D Printer Server.
...
...
Comments
Normally the server is using maybe 20MB so either one action causes a memory leak under some conditions or you did something using much memory, also normally big tasks are done outside.
Was the server running for a long time already?
I ask just to find a hint when the problem exactly occurs so I get a chance to find it.
The server was running for 2 or 3 days max.
The raspberry is connected to the lan by ethernet. The wifi (hotspot done by repetier) only manages two connected sockets.
top -H :
https://www.repetier-server.com/knowledgebase/prepare-for-prusa-mk3-on-raspberry-pi-zerow-rambo-einsy-board/
on how to disable it. Otherwise edit /home/pi/.bashrc at end you see
if ! pgrep "xinit" > /dev/null
then
sudo chmod 660 /dev/tty*
xinit -- -nocursor 2> /dev/null > /dev/null &
fi
The xinit line is starting xserver with chromium. Put a # in front to comment it and reboot.Memory usage seems to differ a lot.
13 days up, some prints on a pi zero (512MB):
630 repetie+ 0 -20 193304 27444 16068 S 0.3 6.2 39:51.20 RepetierServer
4 days up multiple printers Pi3, 1GB:
1108 repetie+ 0 -20 280192 21120 13512 S 0.3 2.2 4:43.45 RepetierServer
For clearance - virtual memory is not what is used. RES or MEM% is what the software actively is using. VIRST includes shared memory and shared libraries. Also all threads use the same memory so it is not 15 * RES in your case.
Speaking of threads I see 15. My pi zero no external connections has 9. Each connection adds one thread and each printer 2 more threads. Do you have several printers connected? Would at least explain you are using 56MB RAM.
One thing I can think of is that one communication thread does not close when connection is closed (e.g. browser closed, pc shut down). If it still stores missing data to send it would at some point use the memory. But all connections have a timeout so that is not happening as fas as I can see.
Will keep an eye on this.
I have only 1 printer + 1 web connection + websockets to send gcode when I push physical buttons on the rapsberry (for extracting/pushing filament or pause/cancel print job).
I will check if websocket connection are closed. But when the server crash, no buttons was pushed.
Thanks a lot.
https://github.com/fthome/repetier_ui
after a new boot, a send a print and every hour I send a "free -m" to a file.
21:00 : 60MB used
22:00 : 66MB used
23:00 : 71MB used
00:00 : 150 MB used
01:00 : 250 MB used
02:00 : 338 MB used
03:00 : 353 MB used
04:00 : 354 MB used
05:00 : 354 MB used
06:00 : 354 MB used
07:00 : 355 MB used
08:00 : 361 MB used
It is the processus Repetier-Server who use the memory
During this period I've just print a job :
/var/lib/Repetier-Server/logs/server.log
the external command is "python set_ifttt.py CMD" (https://github.com/fthome/repetier_ui)
ps aux | grep tier
free is over all programs.Can you also try in parallel if this happens if your command is not running. Just to exclude that this is causing the problem. After all I know users running prints much longer without memory issues. So first step should be to find what is triggering the error.
abstract :
from websocket import create_connection
datas = {'action': 'send', 'data': {'cmd': gcode},'printer':printer}
if repetier_ui is disable, the memory usage is stable
if repetier_ui is enable, the memory usage increase 4 hours after I use my code
If websocket is the problem, can you tell me how to use the api to send gcode to the printer.
the mesures :
2018-10-30 08:00:47,181 - INFO - GPIO10 FALLING => GCODE : goto_z_max.gcode send to HD.
So after 15 hours it increased to 45MB memory usage.
I could debug my memory increase and it is a result of the lua interpreter having not run garbage collector. Not sure when it would decide to run, so for next release I have now forced a garbage collection after every call. That may take in total more time but each collection is tiny and still quick I guess. Need to do some more tests.
Communication over websockets might also cause lua calls as that is how normal users could extend the server side to add new functions. But only if the command is not known by the c++ part, so normally not the problem.
I will try to make a websocket tester that runs many websocket reconnects. Maybe there is a memory issue as well. If I have understoo dthe cod ecorrectly you establish for each test a websocket so a problem there could easily increase faster the memory usage.
Using it as simple api call would be better here, see
https://www.repetier-server.com/manuals/programming/API/index.html
User websocket commands directly
In the websocket chapter you will see all the small commands you normally use while you are displaying server status. Soem of them might also be useful for simple apps just wanting to show some informations. For these we offer a access over the REST api. The only thing that does not work are events as they are reported to open sockets only.
The above schema can be used to send any websocket command as REST call. Use GET or POST just as you like. For bigger data POSt should be preferred. You can replace apikey with sess, but you would normally do this only if you have no session open, so apikey is the normal way to use it.
But i do not establish a lot of websocket connections : this morning, I just make two short connections.
Thanks a lot
Yesterday I trie dusing a browser and reloaded it 50 times without noticeable memory increase. I know it closes connection and leakage tool did not find memory leakage, so in general websockets should not be a problem. But if they keep open it might be a different think.