RepetierServer killed

Hi, 

I have some random kills of the RepetierServer process. I'm running it on a Intel Nuc mini computer with Ubuntu Server running on it. There are 20 printers connected, all with a raspberry pi with ser2net running on them. But that part works pretty wel as far as I see. From time to time it happens that the service of RepetierServer gets killed but I can't figure out why. I've monitored the cpu usage and ram with sar but everything seems stable. The Nuc doesn't reboot or anything. The only thing that was happening was one printer which didn't had the usb connector in, and was not disabled, so it kept logging 'error: Reading serial conection failed: End of file. Closing connection.'  could that be a reason for a crash? Does anyone have a clue of where to find an error or cause like this? 

Thanks in advance,
Christophe

Syslog 
Jan 23 22:12:39 toadi3dprinters systemd[1]: RepetierServer.service: Main process exited, code=killed, status=6/ABRT
Jan 23 22:12:39 toadi3dprinters systemd[1]: RepetierServer.service: Failed with result 'signal'.
Jan 23 22:12:39 toadi3dprinters systemd[1]: RepetierServer.service: Service has no hold-off time (RestartSec=0), scheduling restart.
Jan 23 22:12:39 toadi3dprinters systemd[1]: RepetierServer.service: Scheduled restart job, restart counter is at 1.
Jan 23 22:12:39 toadi3dprinters systemd[1]: Stopped Repetier-Server 3D Printer Server.
Jan 23 22:12:39 toadi3dprinters systemd[1]: Starting Repetier-Server 3D Printer Server...
Jan 23 22:12:39 toadi3dprinters systemd[1]: Started Repetier-Server 3D Printer Server.
RepetierServer
2020-01-23 22:12:31: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:32: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:34: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:35: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:37: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:39: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:39: Start logging...
2020-01-23 22:12:39: Webdirectory: /usr/local/Repetier-Server/www/
2020-01-23 22:12:39: Storage directory: /var/lib/Repetier-Server/
2020-01-23 22:12:39: Configuration file: /usr/local/Repetier-Server/etc/RepetierServer.xml
2020-01-23 22:12:39: Directory for temporary files: /tmp/
2020-01-23 22:12:39: Reading firmware data ...
2020-01-23 22:12:39: Starting Network ...
2020-01-23 22:12:39: Active features:4095
2020-01-23 22:12:39: Reading printer configurations ...
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_0016.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_0016
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_16.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_16
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_12.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_12
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_13.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_13
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_006.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_006
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/Prusa_004.xml
2020-01-23 22:12:39: Starting printjob manager thread for Prusa_004
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_0014.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_0014
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/Prusa_001.xml
2020-01-23 22:12:39: Starting printjob manager thread for Prusa_001
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_0017.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_0017
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_011.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_011
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_0015.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_0015
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_3.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_3
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/Prusa_003.xml
2020-01-23 22:12:39: Starting printjob manager thread for Prusa_003
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_11.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_11
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/Prusa_002.xml
2020-01-23 22:12:39: Starting printjob manager thread for Prusa_002
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_15.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_15
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_10.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_10
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_9.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_9
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_001.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_001
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/RD_Printer_1.xml
2020-01-23 22:12:39: Starting printjob manager thread for RD_Printer_1
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_003.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_003
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_0013.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_0013
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_0012.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_0012
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_004.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_004
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/Prusa_005.xml
2020-01-23 22:12:39: Starting printjob manager thread for Prusa_005
2020-01-23 22:12:39: Reading printer config /var/lib/Repetier-Server/configs/CR_14.xml
2020-01-23 22:12:39: Starting printjob manager thread for CR_14
2020-01-23 22:12:39: Starting printer threads ...
2020-01-23 22:12:39: Starting printer thread for CR_16
2020-01-23 22:12:39: Starting printer thread for CR_006
2020-01-23 22:12:39: Starting printer thread for CR_0016
2020-01-23 22:12:39: Starting printer thread for Prusa_001
2020-01-23 22:12:39: Starting printer thread for Prusa_004
2020-01-23 22:12:39: Starting printer thread for CR_011
2020-01-23 22:12:39: Starting printer thread for CR_0015
2020-01-23 22:12:39: Starting printer thread for Prusa_003
2020-01-23 22:12:39: Starting printer thread for CR_13
2020-01-23 22:12:39: Starting printer thread for CR_15
2020-01-23 22:12:39: Starting printer thread for CR_10
2020-01-23 22:12:39: Starting printer thread for CR_12
2020-01-23 22:12:39: Starting printer thread for RD_Printer_1
2020-01-23 22:12:39: Starting printer thread for CR_11
2020-01-23 22:12:39: Starting printer thread for CR_0013
2020-01-23 22:12:39: Starting printer thread for CR_004
2020-01-23 22:12:39: Starting printer thread for CR_0017
2020-01-23 22:12:39: Starting printer thread for Prusa_005
2020-01-23 22:12:39: Starting printer thread for CR_0014
2020-01-23 22:12:39: Starting printer thread for CR_3
2020-01-23 22:12:39: Starting printer thread for CR_9
2020-01-23 22:12:39: Starting printer thread for CR_003
2020-01-23 22:12:39: Starting printer thread for Prusa_002
2020-01-23 22:12:39: Starting printer thread for CR_0012
2020-01-23 22:12:39: Starting printer thread for CR_14
2020-01-23 22:12:39: Starting printer thread for CR_001
2020-01-23 22:12:39: Starting work dispatcher subsystem ...
2020-01-23 22:12:39: Starting user database ...
2020-01-23 22:12:39: Importing projects ...
2020-01-23 22:12:39: Initializing LUA ...
2020-01-23 22:12:39: Register LUA cloud services
2020-01-23 22:12:39: add G-Code-Renderer
2020-01-23 22:12:39: LUA initalization finished.
2020-01-23 22:12:39: Work dispatcher thread started.
2020-01-23 22:12:39: Internal work dispatcher thread started.
2020-01-23 22:12:39: Starting web server ...
2020-01-23 22:12:39: Webserver started.
2020-01-23 22:12:40: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:40: Connection started: Creality 20
2020-01-23 22:12:40: Connection started: Creality 04
2020-01-23 22:12:40: Connection started: Creality 17
2020-01-23 22:12:40: Connection started: Creality 18
2020-01-23 22:12:40: Connection started: Creality 02
2020-01-23 22:12:40: Connection started: Creality 01
2020-01-23 22:12:40: Connection started: Creality 03
2020-01-23 22:12:41: Connection started: Creality 14
2020-01-23 22:12:41: Connection started: Creality 11
2020-01-23 22:12:41: Connection started: Creality 05
2020-01-23 22:12:41: Connection started: Creality 19
2020-01-23 22:12:41: Connection started: Creality 09
2020-01-23 22:12:41: Connection started: Creality 13
2020-01-23 22:12:41: Connection started: Creality 10
2020-01-23 22:12:41: Connection started: Creality 07
2020-01-23 22:12:41: Connection started: Creality 08
2020-01-23 22:12:41: Connection started: Creality 06
2020-01-23 22:12:41: Connection started: Creality 15
2020-01-23 22:12:41: Connection started: Creality 12
2020-01-23 22:12:41: Connection started: Creality 16
2020-01-23 22:12:41: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:12:42: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:44: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:12:45: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:46: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:12:48: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:50: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:12:52: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:54: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:12:56: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:57: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:12:58: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:12:59: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:00: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:02: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:03: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:05: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:06: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:08: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:10: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:12: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:13: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:14: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:16: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:17: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:18: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:20: error: Reading serial conection failed: Connection reset by peer. Closing connection.
2020-01-23 22:13:21: error: Reading serial conection failed: End of file. Closing connection.
2020-01-23 22:13:23: error: Reading serial conection failed: Connection reset by peer. Closing connection.

Comments

  • Wow - 20 printers on one instance. Never heard about someone having more on one instance. Should of course be no reason for crash.
    First thing is make sure you run 0.93.1 since there are all known reasons for hang/crash fixed so far.
    Then follow https://www.repetier-server.com/knowledgebase/debugging-crashes-hangs-on-linux/ with one deviation - connect to debugger and hit continue. Since your problem is crash and not hang you need to have server running in gdb so you get a full backtrace at the moment  the crash happens. From that I can see where in source code the crash happens and hopefully can see how that problem can arise.
    Important fact is that the console with gdb running must stay open so best is to open it on the nuc it self which hopefully has a monitor/keyboard to do so.
  • edited January 2020
    Repetier said:
    Wow - 20 printers on one instance. Never heard about someone having more on one instance. Should of course be no reason for crash.
    First thing is make sure you run 0.93.1 since there are all known reasons for hang/crash fixed so far.
    Then follow https://www.repetier-server.com/knowledgebase/debugging-crashes-hangs-on-linux/ with one deviation - connect to debugger and hit continue. Since your problem is crash and not hang you need to have server running in gdb so you get a full backtrace at the moment  the crash happens. From that I can see where in source code the crash happens and hopefully can see how that problem can arise.
    Important fact is that the console with gdb running must stay open so best is to open it on the nuc it self which hopefully has a monitor/keyboard to do so.
    Yes, 20 printers and soon we'll go to 40! 

    I have updated the server to 0.93.1 and ran it under gdb in a screen session. I added 20 virtual printers (as our real printers are currently all printing with sd card again) and started virtual prints on all of them. The gcodes are more than 100 hours printing time. And the server crashed again after 2 days. Which is sooner than before. The gdb logging; https://pastebin.com/sYGNGvsv 
    The server.log doesn't show anything strange

    It does look like it has something to do with a virtual printer, so I hope this isn't a crash caused by virtual printers instead of the actual crash we're been having. We could try adding the real printers one by one but we can't afford to lose much prints at this point so it's a bit hard to test.

    I hope you may already find something useful
    Thanks for debugging with me! 
  • Repetier said:
    Wow - 20 printers on one instance. Never heard about someone having more on one instance. Should of course be no reason for crash.
    First thing is make sure you run 0.93.1 since there are all known reasons for hang/crash fixed so far.
    Then follow https://www.repetier-server.com/knowledgebase/debugging-crashes-hangs-on-linux/ with one deviation - connect to debugger and hit continue. Since your problem is crash and not hang you need to have server running in gdb so you get a full backtrace at the moment  the crash happens. From that I can see where in source code the crash happens and hopefully can see how that problem can arise.
    Important fact is that the console with gdb running must stay open so best is to open it on the nuc it self which hopefully has a monitor/keyboard to do so.
    Hi,

    Any update on this? 

    Thx,
    Christophe
  • Not really. I could identify the function where it crashed - the analyser of responses. Unfortunately a quite big function so it is not clear where exactly the problem is. I'm now running 24 concurrent virtual printers with jobs in debugger with full debug informations but so far no new crash. The system uses a newer compiler and libraries and might have different timings then your pc. Not sure if you just had luck that it happened after 2 days or if the new compiler environment makes it not happen. I will continue to run test prints and hope it happens any time soon so I know what it exactly is. In any case it is not a simple thing - must be a rare combination with states in other threads manipulating same data at same time that lead to a wrong pointer.

    Did you do anything or just run the prints? Any special view in browser when it runs? Monitor running as well? That can cause the different thread access when it fullflils queries from browser. Maybe I should try more browser instances to enforce likelyhood of problems.
  • Repetier said:
    Not really. I could identify the function where it crashed - the analyser of responses. Unfortunately a quite big function so it is not clear where exactly the problem is. I'm now running 24 concurrent virtual printers with jobs in debugger with full debug informations but so far no new crash. The system uses a newer compiler and libraries and might have different timings then your pc. Not sure if you just had luck that it happened after 2 days or if the new compiler environment makes it not happen. I will continue to run test prints and hope it happens any time soon so I know what it exactly is. In any case it is not a simple thing - must be a rare combination with states in other threads manipulating same data at same time that lead to a wrong pointer.

    Did you do anything or just run the prints? Any special view in browser when it runs? Monitor running as well? That can cause the different thread access when it fullflils queries from browser. Maybe I should try more browser instances to enforce likelyhood of problems.
    I've only checked the status on the webpage for a couple of minutes. In fact on day two, I never even opened the web interface. There's also no other monitoring apps or API requests going on, plain repetier server with its 20 virtual printers. 

    It did happen very fast with the virtual printers though, with our real printers it always took some weeks until a crash happens.

    Thanks for debugging with me! Currently printing again with sd cards is a huge pain, as we have improved gcodes/models every week. Looking forward working back with your software
  • I fear it was pure luck that it happened so fast with you if you normally take weeks to get the problem. So it is a really rare condition we need to find. Will also reread the source in the function with the problem in mind in hope I see I race condition where it would happen.
Sign In or Register to comment.