Thanks for your explanations regarding the Log! I'll try that, so i don't have to sit next to the printer for hours. That will drastically increase the wife-acceptance-factor! ;-)
Unfortunately the print yesterday completed sucessfully. --> No hints on the bug...
I'm on off-site at a customer for the next 2 days, so I will not be able to do another test before friday. I'll head back then.
Found the time to do some more test-prints. Still using 0.92.6 from 2015/Nov/07 with the two additions mentioned above.
This evening a 4h print completed flawlessly. I even felt dangerous and re-enabled the display.
Maybe the added debug-outputs changed the "structure" of the allocated memory (variables got mixed up and are sorted differently in memory/mapped to differend adresses) changed and the bug does not trigger anymore now? Maybe some other (non-relevant) bit gets flipped and this does no harm?
Any Idea? Will do the negative-test tomorrow and remove the debug-outputs and see, if the bug strikes back then...
I'm quite frustrated now. Have been bug-hunting the whole week at work and now the "important" bug is suddenly non-reproducible...
Welcome in my world:-) I'm also frustrated by the bug, because I never get it and only hear about it so I have to believe this is a problem. Every code change can make this vanish with luck. It just need to mix an other byte instead and you never notice it - at least if it wasn't an important byte. Even moving the declaration of the debug flag might make a difference, this is why I once suggested to show a memory map to see what is near.
I will think about it. But how to find an error that disappears if you try to locate it? I'm currently checking the code for warnings and possible hints from this. So it would be a help if you could post a config file that causes the errors. That should be one that at least contains the code causing it. With all the possible code selections many parts may hide in rarely used combinations.
Here is the Configuration.h as pastebin-snippet. If you need further information I will happily provide them!
I just installed Eclipse C/C++ and the Arduino-Plugin with this step-by-step tutorial. Hopefully I will find myself a little more comfortable if I may use the (3rd-best Java-)IDE which I know a little. Maybe I find something.
At least I expect to learn a little about C/C++ and Arduino-Programming ;-)
I still got bug, even running with the latest 0.92.6. It happens when I am printing "too fast" during printing complex geometry like circles with a lot of polygons. If I print "slow" it goes quite fine, but when turning the speed up it just blocks the extruder, when switching between dryrun it starts again (and catches up with the absolute position). Really sounds like some memory is getting overwritten when the command buffer runs full activating dry run?
You make it a bit difficult to decide. On one side a empty buffer is a special condition and since it is a big array having wrong index there can make quite some damage.
But on the other side M111 S14 is enablign dry run, so it would be no wonder that printer goes to dry run. It even got a line number so I have to think this one comes from the host. Could it be where you enabled/disabled it to get it running again?
I will check if I can copy the error with small line segments and fast moves.
I didn't toggle the dry run that log, then the host (repetierhost) must have send it. But even so, I disabled the working of the M111 command. But still the extruder stopped working, near this moment.
But indeed recreating the error would be best for debugging. If so here is a model which really easy triggered the error for me, Running on a Megatronic v2, sliced with CureEngine at medium speed. When putting the print to 200 % speed mode the extruder stops within around 10 seconds. model
Anyway I will try to continue the search for this bug, I really prefer the repetier firmware and the ability for selecting different tool modes.
(Note: The line error is because I just did a serial echo of "Disabled for bug testing". )
Ok, will see if I can get a gcode that causes the error. CuraEngine is known to produce smaller line segments so your theroy with buffer underrun might by valid, especially since enforcing it speeds up the error. Will report what I find out.
Had also the problem with extruder stopping. I use Mega/Ramps , Graphical Display and encoder. Problem disappeared with setting Endstops check just for homing , NOT always. (i have mechanical switches with " flying wires")
jellehak and got many empty buffers with reduced buffer size, but no dry run. So maybe empty buffer is one factor, but not the only one.
I have now updated the firmware and moved debugLevel into a private variable and only one function can now change it. This function will always write
DebugLevel:8
so we can see in log if dry run was modified by a function calling it explicitly. I also found a function that does in deed enable dry run when a defect sensor was detected. That would be my hottest candidate for now, so I added the error message
Disabling all heaters due to detected sensor defect.
so we know it came from this. Hope if some users who get the error could check if the change is triggered now and comes with an according message. Would make finding the root problem much easier if it is.
Hi! Just wanted to state, that I updated to 0.92.6 (previous Version had same label, but a diff shows many changes) at around 7PM today (2015-Nov-27). I intentionally did NOT apply the debugging-modifications that fixed the error for me, and so far after 2hours of printing I had no problem, even not while printing the model that would have triggered the stopped extruder within minutes.
I just wanted to put in, that I was experiencing the same problem. The extruder motor just stopped turning randomly on my newest printer. This is my 3rd printer and 2nd CoreXY and never seen that behavior before. I came to this post last night looking for a solution. I never tried to enable and disable dry-run but just updated to the newest Repetier straight away. I have not seen the issue since - So just another confirmation that the issue existed and seemingly does not exist anymore.
While I like that it is gone, I'm not certain why. I did not change the logic only require now to call a function to do so.
One vendor even told me using a older compiler helped with that problem, so it keeps a mystery. The good think is, if it would happen again, we get at least a log entry. And maybe it really was just a stupid compiler optimization that wen't wrong and with the new code doesn't. So let's hope no one reports this again:-)
From your description I think the log part is where you turned it on/off from host?
What is with the log part where it got turned off? Any messages there? That is the critical part we need to find the problem. You should be able to see it in the saved log of the host (if you have enabled this). There you also see temperatures so you see either a DebugLevel line or see dropping temperatures.
I don't have those logs (I think). I did not write a log file. I usually put my PC to sleep after I start a print, I don't remember if I did that yesterday when it failed. I think I must have done something, because the first log I have in my current session is from 19:27 yesterday (have not closed Repetier since).
BUT I can help a little, I still have the actual temperature curve from the print:
There is a distinct temperature drop which matches nicely with when I think it roughly must have failed - Just, it does not turn off the heaters. I looked at some of my other successful prints and I can't seem to find a similar temperature drop in any of them.
Question - How can I still have the temperature logs, do they stay in the program until I close it? If yes, why don't the logs stay also? And can I somehow have the temperature logs even if my PC was sleeping at that point in time?
Perfect! Especially the combination with the temperature curves. This really looks like it gets triggered in temperature manager with no prior message. So I have something to look at. The temp. curves show a hardware problem with your printer which also explains why this does not happen to me. So I guess the try to disable heaters was correct but implementation of this is faulty somewhere as it gives no messages.
If you look at your temp. curve you see temperature swings that are physically impossible. These must have triggered a defect rule causing firmware to go to dry run. I've heard that some hotends have thermistors that start playing crazy after a while.A bit strange is the second occurance where bed and extruder have the same swings starting at the same time. That makes me doubt it is the thermistor it self. It must be a factor influencing moth measurements.
Same problem here. The printer has a "mixing nozzle" and three extruders.
I am now running the latest git version (work092 branch), downloaded today yesterday. I am printing PLA, and the heated bed is not used - the bed temperature is set to 0 in the slic3r config.
During my last print it failed because of HEATED_BED_MAXTEMP +5 being reached.
I know because I made this change in extruder.cpp around like 280:
if(Printer::isAnyTempsensorDefect())
{
Com::printFLN(PSTR("Disabling all heaters due to detected sensor defect."));
for(uint8_t i = 0; i < NUM_TEMPERATURE_LOOPS; i++)
> 01:08:03.748 : In Extruder::getHeatedBedTemperature, c->currentTemperatureC=190, NUM_TEMPERATURE_LOOPS=4
> 01:08:03.748 : Disabling all heaters due to HEATED_BED_MAX_TEMP.
> 01:08:03.748 : DebugLevel:14
so it switches off my extruder because my heated bed has a temp of 190 degrees. And I have 4 temp loops but only two sensors (a MAX6675 thermocouple in the nozzle and a beta3950 thermnistor under the bed.)
I got the exact same temperature on the next run as well:
That is a temperature curve of 4 prints. Follow the green line to see when I turn on and off.
What we see is that 1 and 2 was run from SD card. Then I ran one from the PC host (and fine) and then I try the SD card again and spike again. The SD card and the computer host was not the same model. The SD card was the same model all the time.
Strange, no?
Mind it is not a SD card connected through a display. It is "just" the SD card. Maybe some interrupt handling?
TO troubleshoot my problem I have now:
- Changed the SD card
- Changed the entire SD card holder
- Tried both binary and "normal" saving on SD card
It does seem to be speed dependent. As in, more spikes at higher speeds.
For clarification: The problem does also occur when printing via. host.. After some time it starts spiking also. When entirely disconnecting the SD card it does seem to become good. THis is still pending more tests.
I realize that this may not be super relevant because what is central to this troubleshooting is why the extruder stops extruding but the heaters does not turn off when the printer thinks there is a sensor defect. But still just wanted to let you know.
Ok, just to sum up. That didn't do it (printing with SD card disconnected). Changing thermistor didn't do it. Changing RAMPS didn't do it. Changing MEGA didn't do it. I'm starting to think it is the extruder motor that has a problem sending out interference. Will change it tomorrow.
For now I will try to print something real slow and see if it makes it through.
@Lars How did you do the heater power and thermistor cables? Are they twisted to prevent cross talk influencing the result? That might make more difference then a sd card. I can not believe a sd card or printing source has an influence on this. Normally it is cross talk, a short that happens for some moves, a thermistor unter too much pressure from a screw or a screw having rubbed of isolation somehwere or voltage drops due to power changes.
which is a bed temperature of 26.82°C. I assume you have NOT max bed temp. set to 20°C? Maybe you could add the measured temperature to the error message so you know what happend. If it is max/min it is a short/break for a small time. M105 is only called every x seconds so it might miss spikes we have in error creation. If this interruption is short enough it will not show as min/max as we take the average of several measurements. This normally protects a bit for false measures but does not help on longer periods.
You could call
Commands::printTemperatures(true);
to get a full output of the measured data for all at the error. Just to have all data complete. I find the 190°C a bit hard to believe as the extruder hat the same temperature in your first log. But maybe it really happend to be, who knows.
One thing I found out is that in cases where we got no message before it seems to be the bed triggering. Moreover - while all defects need to be repeated 10 times, the bed defect requires only one false signal to do so. I will modify that to require 10 defects as well for next update.
First an unrelated thing you may like to hear: the babystepping works, so your fix was good. thanks.
Another thing: I get lots of communication errors, sometimes unknown commands, on this printer. Using the shortest shielded usb cable I could find. baudrate 115200 (tried reducing it, but even 57600 didnt stop the errors) Should I swap out my arduino mega 2560 for a new one maybe? I swapped it out before and its a bit complicated to swap it out, so I wouyld prerfer not to.
To answer your question:
26.82 is just above the ambient (room) temperature, its the temp i would expect to see when the bed heater is off.
My maxtemp for bed was set to 115, (changed it to 130 now.)
I added some debug prints to all calls to setanytempsensordefect() yesterday, but it has not been printed yet, so I guess setanytempsensordefect() is not being called.
After adding that printTemperatures(true) command I get:
> 15:22:49.828 : In Extruder::getHeatedBedTemperature, c->currentTemperatureC=190NUM_TEMPERATURE_LOOPS=4
Ok, good to hear that at least babystepping works:-)
With NUM_TEMPERATURE_LOOPS the highest index is NUM_TEMPERATURE_LOOPS-1 to we can ignore the first value in your log with nonsense values.
Bed is always at index NUM_EXTRUDER which is NUM_TEMPERATURE_LOOPS-1 so at least the 190 are really for the bed. -2 and -3 are unused since you have a mixing extruder that only uses extruder 0. They are still there since the complete program was coded to have one controller per extruder. The raw value of the bed is 120 which is new an extreme so 190 could match that and explains the selection. It seems to be a rare misread that happens every now and then for you. The best is a few lines later (0.14 seconds later) where it already reads
T:198.25 /200 B:27.17 /0 B@:0 @:0
So you see it is really only one drop in measured temperature and hardware related. So the fix for this I have just uploaded would have worked here. Now we need 10 repeating misread values to trigger the dry run and that is at least in your scenario quite unlikely. So I hope the problem is now gone also with bed.
Regarding missed line detected I believe that it gives 2 possible reasons. One is real transfer/com error. You see this in the log that some chars have a random value.
The other more annoying reason is that somewhere the content gets mixed with older content. This seems to happen especially with longer or more output in a short time. Until now I could not find the real source of the problem. I recently had it on an Azteeg X3 board connected to a Pi 2 running our server. At some point I got data that was send over 128 bytes before. I pronounce the 128 because that is the size of the ring buffer in firmware. Since everything gets overwritten after 128 byte for serial communication it can not be stored anymore in firmware making it impossible to send these bytes. Next one is serial->usb converter chip. Strange thing that happened with that board is that I connected it with windows to test if I get the errors there. Installed latest FTDI driver and had no errors at all. Back to pi and the errors were gone at all. So did the driver install a fix for an error here? Does arduino avr converter have a similar bug? Or was it something with linux that was suddenly better? Not sure where it happens but I do not believe that it is a classic communication problem in that case.
Only good thing is that current error correction seems to work very reliable so I never had a pause in the time where it happened.
Comments
I'll try that, so i don't have to sit next to the printer for hours.
That will drastically increase the wife-acceptance-factor! ;-)
Unfortunately the print yesterday completed sucessfully.
--> No hints on the bug...
I'm on off-site at a customer for the next 2 days, so I will not be able to do another test before friday.
I'll head back then.
bye,
jan
Found the time to do some more test-prints.
Still using 0.92.6 from 2015/Nov/07 with the two additions mentioned above.
This evening a 4h print completed flawlessly.
I even felt dangerous and re-enabled the display.
Maybe the added debug-outputs changed the "structure" of the allocated memory (variables got mixed up and are sorted differently in memory/mapped to differend adresses) changed and the bug does not trigger anymore now?
Maybe some other (non-relevant) bit gets flipped and this does no harm?
Any Idea?
Will do the negative-test tomorrow and remove the debug-outputs and see, if the bug strikes back then...
I'm quite frustrated now.
Have been bug-hunting the whole week at work and now the "important" bug is suddenly non-reproducible...
bye,
Jan
If you need further information I will happily provide them!
I just installed Eclipse C/C++ and the Arduino-Plugin with this step-by-step tutorial.
Hopefully I will find myself a little more comfortable if I may use the (3rd-best Java-)IDE which I know a little.
Maybe I find something.
At least I expect to learn a little about C/C++ and Arduino-Programming ;-)
Problem disappeared with setting Endstops check just for homing , NOT always.
(i have mechanical switches with " flying wires")
#define ALWAYS_CHECK_ENDSTOPS 0 // default 1
I also changed :
#define PRINTLINE_CACHE_SIZE 25 // default 16
#define LOW_TICKS_PER_MOVE 400000 // default 250000
as buffer was running low while printing letters with lots of small segments
Printed now for more than 50 hours without any trouble.
best regards,
RAyWB
Disabling all heaters due to detected sensor defect.
so we know it came from this. Hope if some users who get the error could check if the change is triggered now and comes with an according message. Would make finding the root problem much easier if it is.
Just wanted to state, that I updated to 0.92.6 (previous Version had same label, but a diff shows many changes) at around 7PM today (2015-Nov-27).
I intentionally did NOT apply the debugging-modifications that fixed the error for me, and so far after 2hours of printing I had no problem, even not while printing the model that would have triggered the stopped extruder within minutes.
Commands::printTemperatures(true);
to get a full output of the measured data for all at the error. Just to have all data complete. I find the 190°C a bit hard to believe as the extruder hat the same temperature in your first log. But maybe it really happend to be, who knows.
One thing I found out is that in cases where we got no message before it seems to be the bed triggering. Moreover - while all defects need to be repeated 10 times, the bed defect requires only one false signal to do so. I will modify that to require 10 defects as well for next update.
First an unrelated thing you may like to hear: the babystepping works, so your fix was good. thanks.