random freezes on raspberrypi 3

edited May 2019 in Bug Reports
Hi! I has raspberrypi 3 and random freezes when i have print. Log of the print in attachmend files ( https://dropmefiles.com/V9Oxl ).
I has install image for raspberrypi. Printer motherboard is mks gen l with repetier-firmware 1.0.3. Repetier-server is 0.91.2.







Comments

  • Without knowing exact times of blobs appearing it is not really possible to say where it happens. Best guess is where "wait" appears which is when firmware buffer is empty for a second without getting a new command. Here 2 cases where it happened:

    < 18:06:40.691: N278038 G1 X76.424 Y85.736 E1989.07437
    > 18:06:44.046: ok 15890
    > 18:06:44.046: ok 15891
    > 18:06:44.047: ok 15892
    > 18:06:44.047: ok 15893
    > 18:06:44.047: ok 15894
    > 18:06:44.048: T:239.74 /240 B:80.00 /80 B@:85 @:194
    > 18:06:44.049: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.049: wait
    > 18:06:44.050: T:239.49 /240 B:79.96 /80 B@:133 @:210
    > 18:06:44.050: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.050: wait
    > 18:06:44.051: T:239.49 /240 B:79.92 /80 B@:71 @:191
    > 18:06:44.051: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.051: wait
    < 18:06:44.052: N278039 M117 ETE 00:51:50
    < 18:06:44.053: N278040 G1 X76.586 Y85.654 E1989.08309


    > 18:08:33.574: ok 19927
    < 18:08:33.574: N282076 G1 X62.639 Y96.271 E2165.21748
    > 18:08:33.606: ok 19928
    < 18:08:33.606: N282077 G1 X62.677 Y96.495 E2165.22752
    > 18:08:33.632: ok 19929
    < 18:08:33.632: N282078 G1 X62.534 Y96.289 E2165.22941
    > 18:08:33.636: ok 19930
    < 18:08:33.636: N282079 G1 X62.313 Y96.07 E2165.23619
    > 18:08:33.641: ok 19931
    < 18:08:33.641: N282080 G1 X62.125 Y95.935 E2165.24452
    > 18:08:33.645: ok 19932
    < 18:08:33.645: N282081 G1 X62.055 Y95.885 E2165.24832
    > 18:08:34.847: ok 19933
    > 18:08:34.847: ok 19934
    > 18:08:34.847: ok 19935
    > 18:08:34.848: ok 19936
    > 18:08:34.848: ok 19937
    > 18:08:34.849: T:239.74 /240 B:79.96 /80 B@:77 @:202
    > 18:08:34.849: wait
    < 18:08:34.849: N282082 M117 ETA 18:58:25 day 12
    < 18:08:34.850: N282083 G1 X61.764 Y95.754 E2165.26242

    What you see is that the server sends new commands as soon as it receives the "ok" but they seem to get delayed which you see as firmware also sends wait, so firmware had already send them correctly but it took time to be received. 

    Please connect during print with your pi and check with "top" cpu and memory usage.  See if memory gets too high so it starts swapping or if some processes take too much cpu time so it will delay communication. It might also be a bandwidth problem with usb especially if you have a webcam with high resolution and view it. Network and printer go all over same usb bus so they may block communication to each other. Especially if you have a webcam without mjpg compression.

    Good thing is there are no communication errors, so that is ok. It is not a baud rate thing.

    One thing to improve might be use 32 moves buffer in firmware instead of default 16. That reduces the pauses and likelyhood to happen.
  • Repetier said:
    Without knowing exact times of blobs appearing it is not really possible to say where it happens. Best guess is where "wait" appears which is when firmware buffer is empty for a second without getting a new command. Here 2 cases where it happened:

    < 18:06:40.691: N278038 G1 X76.424 Y85.736 E1989.07437
    > 18:06:44.046: ok 15890
    > 18:06:44.046: ok 15891
    > 18:06:44.047: ok 15892
    > 18:06:44.047: ok 15893
    > 18:06:44.047: ok 15894
    > 18:06:44.048: T:239.74 /240 B:80.00 /80 B@:85 @:194
    > 18:06:44.049: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.049: wait
    > 18:06:44.050: T:239.49 /240 B:79.96 /80 B@:133 @:210
    > 18:06:44.050: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.050: wait
    > 18:06:44.051: T:239.49 /240 B:79.92 /80 B@:71 @:191
    > 18:06:44.051: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.051: wait
    < 18:06:44.052: N278039 M117 ETE 00:51:50
    < 18:06:44.053: N278040 G1 X76.586 Y85.654 E1989.08309


    > 18:08:33.574: ok 19927
    < 18:08:33.574: N282076 G1 X62.639 Y96.271 E2165.21748
    > 18:08:33.606: ok 19928
    < 18:08:33.606: N282077 G1 X62.677 Y96.495 E2165.22752
    > 18:08:33.632: ok 19929
    < 18:08:33.632: N282078 G1 X62.534 Y96.289 E2165.22941
    > 18:08:33.636: ok 19930
    < 18:08:33.636: N282079 G1 X62.313 Y96.07 E2165.23619
    > 18:08:33.641: ok 19931
    < 18:08:33.641: N282080 G1 X62.125 Y95.935 E2165.24452
    > 18:08:33.645: ok 19932
    < 18:08:33.645: N282081 G1 X62.055 Y95.885 E2165.24832
    > 18:08:34.847: ok 19933
    > 18:08:34.847: ok 19934
    > 18:08:34.847: ok 19935
    > 18:08:34.848: ok 19936
    > 18:08:34.848: ok 19937
    > 18:08:34.849: T:239.74 /240 B:79.96 /80 B@:77 @:202
    > 18:08:34.849: wait
    < 18:08:34.849: N282082 M117 ETA 18:58:25 day 12
    < 18:08:34.850: N282083 G1 X61.764 Y95.754 E2165.26242

    What you see is that the server sends new commands as soon as it receives the "ok" but they seem to get delayed which you see as firmware also sends wait, so firmware had already send them correctly but it took time to be received. 

    Please connect during print with your pi and check with "top" cpu and memory usage.  See if memory gets too high so it starts swapping or if some processes take too much cpu time so it will delay communication. It might also be a bandwidth problem with usb especially if you have a webcam with high resolution and view it. Network and printer go all over same usb bus so they may block communication to each other. Especially if you have a webcam without mjpg compression.

    Good thing is there are no communication errors, so that is ok. It is not a baud rate thing.

    One thing to improve might be use 32 moves buffer in firmware instead of default 16. That reduces the pauses and likelyhood to happen.
    Thx! Sometimes i check load CPU and RAM load when i can see an freezes on my printer. But i can see that CPU load and RAM is OK. I will change to 32 moves buffer and I hope that it will fix the problem.
  • edited May 2019
    I changed 32 moves buffer  - it is not work. Now i can see that if print is freeze in Htop is all OK with CPU and RAM. In console command i can see the freezes also (i mean that an command have got freezes in the console if the print have a freeze also).
    For help detect the problem i add an dmesg.log file. I changed port for the printer and it not helped.

    Repetier said:
    Without knowing exact times of blobs appearing it is not really possible to say where it happens. Best guess is where "wait" appears which is when firmware buffer is empty for a second without getting a new command. Here 2 cases where it happened:

    < 18:06:40.691: N278038 G1 X76.424 Y85.736 E1989.07437
    > 18:06:44.046: ok 15890
    > 18:06:44.046: ok 15891
    > 18:06:44.047: ok 15892
    > 18:06:44.047: ok 15893
    > 18:06:44.047: ok 15894
    > 18:06:44.048: T:239.74 /240 B:80.00 /80 B@:85 @:194
    > 18:06:44.049: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.049: wait
    > 18:06:44.050: T:239.49 /240 B:79.96 /80 B@:133 @:210
    > 18:06:44.050: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.050: wait
    > 18:06:44.051: T:239.49 /240 B:79.92 /80 B@:71 @:191
    > 18:06:44.051: Warning: Seems like we missed a ok - continue sending.
    > 18:06:44.051: wait
    < 18:06:44.052: N278039 M117 ETE 00:51:50
    < 18:06:44.053: N278040 G1 X76.586 Y85.654 E1989.08309


    > 18:08:33.574: ok 19927
    < 18:08:33.574: N282076 G1 X62.639 Y96.271 E2165.21748
    > 18:08:33.606: ok 19928
    < 18:08:33.606: N282077 G1 X62.677 Y96.495 E2165.22752
    > 18:08:33.632: ok 19929
    < 18:08:33.632: N282078 G1 X62.534 Y96.289 E2165.22941
    > 18:08:33.636: ok 19930
    < 18:08:33.636: N282079 G1 X62.313 Y96.07 E2165.23619
    > 18:08:33.641: ok 19931
    < 18:08:33.641: N282080 G1 X62.125 Y95.935 E2165.24452
    > 18:08:33.645: ok 19932
    < 18:08:33.645: N282081 G1 X62.055 Y95.885 E2165.24832
    > 18:08:34.847: ok 19933
    > 18:08:34.847: ok 19934
    > 18:08:34.847: ok 19935
    > 18:08:34.848: ok 19936
    > 18:08:34.848: ok 19937
    > 18:08:34.849: T:239.74 /240 B:79.96 /80 B@:77 @:202
    > 18:08:34.849: wait
    < 18:08:34.849: N282082 M117 ETA 18:58:25 day 12
    < 18:08:34.850: N282083 G1 X61.764 Y95.754 E2165.26242

    What you see is that the server sends new commands as soon as it receives the "ok" but they seem to get delayed which you see as firmware also sends wait, so firmware had already send them correctly but it took time to be received. 

    Please connect during print with your pi and check with "top" cpu and memory usage.  See if memory gets too high so it starts swapping or if some processes take too much cpu time so it will delay communication. It might also be a bandwidth problem with usb especially if you have a webcam with high resolution and view it. Network and printer go all over same usb bus so they may block communication to each other. Especially if you have a webcam without mjpg compression.

    Good thing is there are no communication errors, so that is ok. It is not a baud rate thing.

    One thing to improve might be use 32 moves buffer in firmware instead of default 16. That reduces the pauses and likelyhood to happen.

  • Ok, your htop screens show no high cpu or memory usage so no reason for a freeze here.

    The fact that console also freezes at the times shows clearly it is no server issue but complete linux freezes for some reason. Explains why everything continues without errors as there are no errors leading to the problem.

    The big question is what is linux doing that no processes run. A normal process can not do it I think so it must be something with high priority. Memory swapping is such a thing but total memory usage shows less then 10% are used, no swap. So must be something different. It might be wlan dependent. Can you test with ethernet connection and wlan completely disabled or set to always access point mode. Maybe it is loosing connection and reconnection blocks apps until it back again.

    If it i snot wlan you could try running in console
    tail -f /var/log/syslog
    and see messages just before/after freezes. If you always see the same on freezes you have a hint what is causing it.
  • edited May 2019
    Repetier said:
    Ok, your htop screens show no high cpu or memory usage so no reason for a freeze here.

    The fact that console also freezes at the times shows clearly it is no server issue but complete linux freezes for some reason. Explains why everything continues without errors as there are no errors leading to the problem.

    The big question is what is linux doing that no processes run. A normal process can not do it I think so it must be something with high priority. Memory swapping is such a thing but total memory usage shows less then 10% are used, no swap. So must be something different. It might be wlan dependent. Can you test with ethernet connection and wlan completely disabled or set to always access point mode. Maybe it is loosing connection and reconnection blocks apps until it back again.

    If it i snot wlan you could try running in console
    tail -f /var/log/syslog
    and see messages just before/after freezes. If you always see the same on freezes you have a hint what is causing it.
    I tested it. I connect uses ethernet port but wifi is on(Long ago). Now wifi is off (you can see it in syslog file). Now an freezes coincide in time with "wlan0: Failed to initiate sched scan" in syslog sometimes. I changed power supply for raspberry pi it not helped.
    syslog:

  • Ok, that comes from wlan scanner
    Run

    sudo chmod a-x /usr/local/Repetier-Setup/bin/wlanscanner

    to disable it so it not executabel.

    chmod a+x /usr/local/Repetier-Setup/bin/wlanscanner

    makes it executeable again.
    If it does not execute it can not block. Should at least show if that is the reason. If it is enable it so you can select your wifi and once wifi works you can make it not executeable again. wifi should still work but you prevent extra scants requested by server. Then question would still stay why it blocks for you and not for me, but first lets see if it really the reason.
  • edited May 2019
    Repetier said:
    Ok, that comes from wlan scanner
    Run

    sudo chmod a-x /usr/local/Repetier-Setup/bin/wlanscanner

    to disable it so it not executabel.

    chmod a+x /usr/local/Repetier-Setup/bin/wlanscanner

    makes it executeable again.
    If it does not execute it can not block. Should at least show if that is the reason. If it is enable it so you can select your wifi and once wifi works you can make it not executeable again. wifi should still work but you prevent extra scants requested by server. Then question would still stay why it blocks for you and not for me, but first lets see if it really the reason.
    It not helped. Now i has freezes ~1-2 sec and i can see that ~1-2 freezes  on 1 layer.
    May be i need reflash motherboard's firmware or reinstall sd image into raspberrypi?

  • Now i tested it from sdcard and it work good without freezes. USB cable may be?
  • Now that I see
    May 19 18:09:57 Repetier-Server wpa_supplicant[456]: wlan0: Failed to initiate sched scan
    it is clear it is not our scanner causing the messages that you said to coincide with the freezes. wpa_supplicant is the software doing it. So try to disable wlan0 completely to stop it from doing it (requires ethernet connection)
    sudo ifconfig wlan0 down

    hopefully stops it. See ifconfig afterwards.

    If you mean sd card on printer it is clear it does not freeze. If you are right and wpa_supplicant causes the problems a usb cable is not the solution. Also since you had no communication error there is no reason to thing it is the problem.
  • Repetier said:
    Now that I see
    May 19 18:09:57 Repetier-Server wpa_supplicant[456]: wlan0: Failed to initiate sched scan
    it is clear it is not our scanner causing the messages that you said to coincide with the freezes. wpa_supplicant is the software doing it. So try to disable wlan0 completely to stop it from doing it (requires ethernet connection)
    sudo ifconfig wlan0 down

    hopefully stops it. See ifconfig afterwards.

    If you mean sd card on printer it is clear it does not freeze. If you are right and wpa_supplicant causes the problems a usb cable is not the solution. Also since you had no communication error there is no reason to thing it is the problem.
    "wlan0: Failed to initiate sched scan" coincide with the freezes sometimes, not everytime. I will tested it solution.
    G
    M
    T
    Y
    Определить языкАзербайджанскийАлбанскийАмхарскийАнглийскийАрабскийАрмянскийАфрикаансБаскскийБелорусскийБенгальскийБирманскийБолгарскийБоснийскийВаллийскийВенгерскийВьетнамскийГавайскийГаитянскийГалисийскийГолландскийГреческийГрузинскийГуджаратиДатскийЗулуИвритИгбоИдишИндонезийскийИрландскийИсландскийИспанскийИтальянскийЙорубаКазахскийКаннадаКаталанскийКиргизскийКитайский ТрадКитайский УпрКорейскийКорсиканскийКурманджиКхмерскийКхосаЛаосскийЛатинскийЛатышскийЛитовскийЛюксембургскийМакедонскийМалагасийскийМалайскийМалаяламМальтийскийМаориМаратхиМонгольскийНемецкийНепальскийНорвежскийПанджабиПерсидскийПольскийПортугальскийПуштуРумынскийРусскийСамоанскийСебуанскийСербскийСесотоСингальскийСиндхиСловацкийСловенскийСомалийскийСуахилиСунданскийТаджикскийТайскийТамильскийТелугуТурецкийУзбекскийУкраинскийУрдуФилиппинскийФинскийФранцузскийФризскийХаусаХиндиХмонгХорватскийЧеваЧешскийШведскийШонаШотландский (гэльский)ЭсперантоЭстонскийЯванскийЯпонский
    АзербайджанскийАлбанскийАмхарскийАнглийскийАрабскийАрмянскийАфрикаансБаскскийБелорусскийБенгальскийБирманскийБолгарскийБоснийскийВаллийскийВенгерскийВьетнамскийГавайскийГаитянскийГалисийскийГолландскийГреческийГрузинскийГуджаратиДатскийЗулуИвритИгбоИдишИндонезийскийИрландскийИсландскийИспанскийИтальянскийЙорубаКазахскийКаннадаКаталанскийКиргизскийКитайский ТрадКитайский УпрКорейскийКорсиканскийКурманджиКхмерскийКхосаЛаосскийЛатинскийЛатышскийЛитовскийЛюксембургскийМакедонскийМалагасийскийМалайскийМалаяламМальтийскийМаориМаратхиМонгольскийНемецкийНепальскийНорвежскийПанджабиПерсидскийПольскийПортугальскийПуштуРумынскийРусскийСамоанскийСебуанскийСербскийСесотоСингальскийСиндхиСловацкийСловенскийСомалийскийСуахилиСунданскийТаджикскийТайскийТамильскийТелугуТурецкийУзбекскийУкраинскийУрдуФилиппинскийФинскийФранцузскийФризскийХаусаХиндиХмонгХорватскийЧеваЧешскийШведскийШонаШотландский (гэльский)ЭсперантоЭстонскийЯванскийЯпонский
    Звуковая функция ограничена 200 символами
  • edited May 2019
    Repetier said:
    Now that I see
    May 19 18:09:57 Repetier-Server wpa_supplicant[456]: wlan0: Failed to initiate sched scan
    it is clear it is not our scanner causing the messages that you said to coincide with the freezes. wpa_supplicant is the software doing it. So try to disable wlan0 completely to stop it from doing it (requires ethernet connection)
    sudo ifconfig wlan0 down

    hopefully stops it. See ifconfig afterwards.

    If you mean sd card on printer it is clear it does not freeze. If you are right and wpa_supplicant causes the problems a usb cable is not the solution. Also since you had no communication error there is no reason to thing it is the problem.
    It not helped for me. Now i can see freezes 3-8 sec sometimes.

  • So the messages in log are gone but still freezing?

    How good is the sd card?
    Here a short code sequence to test speed:

    dd if=/dev/zero of=~/test.tmp bs=500K count=1024
    sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
    dd if=~/test.tmp of=/dev/null bs=500K count=1024
    rm ~/test.tmp


    Here my results for a normal working card:

    pi@Repetier-Server:~ $ dd if=/dev/zero of=~/test.tmp bs=500K count=1024

    1024+0 records in

    1024+0 records out

    524288000 bytes (524 MB, 500 MiB) copied, 41.5075 s, 12.6 MB/s

    pi@Repetier-Server:~ $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

    3

    pi@Repetier-Server:~ $ dd if=~/test.tmp of=/dev/null bs=500K count=1024

    1024+0 records in

    1024+0 records out

    524288000 bytes (524 MB, 500 MiB) copied, 24.1094 s, 21.7 MB/s

    pi@Repetier-Server:~ $ rm ~/test.tmp


    First speed 12.6MB/s is write speed. Second (21.7MB) is read speed. Maybe a slow card can produce the freezes. 

    Do you have logging enabled? Then try without logging as it means many writes. Also the rescue system does many writes so disable it. Then there is nearly no write left and reading is only at speed of printing.
  • Repetier said:
    So the messages in log are gone but still freezing?

    How good is the sd card?
    Here a short code sequence to test speed:

    dd if=/dev/zero of=~/test.tmp bs=500K count=1024
    sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
    dd if=~/test.tmp of=/dev/null bs=500K count=1024
    rm ~/test.tmp


    Here my results for a normal working card:

    pi@Repetier-Server:~ $ dd if=/dev/zero of=~/test.tmp bs=500K count=1024

    1024+0 records in

    1024+0 records out

    524288000 bytes (524 MB, 500 MiB) copied, 41.5075 s, 12.6 MB/s

    pi@Repetier-Server:~ $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches

    3

    pi@Repetier-Server:~ $ dd if=~/test.tmp of=/dev/null bs=500K count=1024

    1024+0 records in

    1024+0 records out

    524288000 bytes (524 MB, 500 MiB) copied, 24.1094 s, 21.7 MB/s

    pi@Repetier-Server:~ $ rm ~/test.tmp


    First speed 12.6MB/s is write speed. Second (21.7MB) is read speed. Maybe a slow card can produce the freezes. 

    Do you have logging enabled? Then try without logging as it means many writes. Also the rescue system does many writes so disable it. Then there is nearly no write left and reading is only at speed of printing.
    Oh. I have select all of them. The rescue system have write data on sd card all time. I believe that the data write to RAM. I will tested it.
  • Mine resultats:
    pi@Repetier-Server:~ $ dd if=/dev/zero of=~/test.tmp bs=500K count=1024
    1024+0 records in
    1024+0 records out
    524288000 bytes (524 MB, 500 MiB) copied, 32.0775 s, 16.3 MB/s
    pi@Repetier-Server:~ $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches
    3
    pi@Repetier-Server:~ $ dd if=~/test.tmp of=/dev/null bs=500K count=1024
    1024+0 records in
    1024+0 records out
    524288000 bytes (524 MB, 500 MiB) copied, 35.7277 s, 14.7 MB/s

    Now i tested it without logs and rescue system. 6's layer and now it ok.

  • OK, will test next time with both enabled as counter test. Would be bad if that bit of writing would make a difference. And no it does not write to ram that would not be save during power loss cases. Would only help on connection problems.
  • Repetier said:
    OK, will test next time with both enabled as counter test. Would be bad if that bit of writing would make a difference. And no it does not write to ram that would not be save during power loss cases. Would only help on connection problems.
    If powerloss i has UPS for raspi. You can add checkbox for writing data into RAM. I think that it (a lot of write data into sdcard) can go kill sdcard. I make some prints and without the checkboxs it work OK! Thx for support!

  • For good sd cards the war leveling should prevent over usage of certain blocks so at least a bit bigger cards with free memory should not have problems with life time. In any case normal log is not useful - it is only useful if you need to debug problems, otherwise it is a file you will never look at. So at least disabling that is no problem.

    I should check if the library for log file does flush it all the time. That would increase time and writes and is not required. But I think it does not.
Sign In or Register to comment.