Start a new topic
Solved

Time limiter on rotation doesn't limit

I have a CheckDB job scheduled to run at 4 AM and a time-based rotation limiter for 180 minutes.


The job kicks off properly at 4 AM, but ran 4 1/2 hours this morning until I killed it. It ran past 3 hours yesterday as well.


It's running well and processing lots of DBs, it's just not stopping when it's supposed to.


Ok, there are a couple things here.

1st, if you're processing a big DB it won't stop in the middle of it.  It'll finish it and then stop.

2nd, the way the time estimation works is it looks at the last time the checkdb ran for the current DB.  Then it looks at the size of the DB when it was last processed, and how big it is now.  Then it calculates the speed it processed at last time and calculates how long it'll take this time.  Because the DB could be much bigger or smaller than it was last time it ran.  If it's calculated to take longer than you have left in your rotation, it moves on to the next DB and goes through the same process.


Now if you look in the SettingsDB table you'll see the DefaultTimeEstimateMins col.  This tells MC what time you want to estimate for DBs have don't any any history in the log to go by.  So If it's a new DB or it just hasn't gotten to it in the rotation yet, you could set this for say 10mins and it'll use that as the time estimate.  And the next time it runs, since it'll have an entry in the log, it'll do the calculations.


There are some limitations to this though.  

If there's a big difference in the workload on the box then it could be much faster or slower than expected.

If checkdb finds a lot of errors it will take longer.

And of course, other stuff related to perf.


Ok, I wanted you to understand how the process works before getting into things.


A couple Qs:

1. Is the DB it's stopping on really big?

2. Do you have the DefaultTimeEstimateMins col filled in?

3. Do all the DBs have an entry in the LogDetails table?

In this job, I'm processing a few thousand very small DBs. While it's running, I see the databases popping into CheckDBLogDetails as everything cycles through.


Looking in CheckDBLogDetailsLatest, OpRunTimeInSecs is in the 40-45 second range for the databases that get checked starting at 4 AM (my 11 PM job has a few larger databases which take considerably longer).

CheckSettingsDB has DefaultTimeEstimateMins set to 1 for the CHECKDB operation.


Not all of the DBs have an entry in LogDetails because I haven't gotten through all of my databases with MC yet.

I was just going through our last email chain and your initial tests with the rotation worked.  Tell you what, I'll see if I can put together a version that logs the time info so we can maybe see what's going on.

I'll send it to you in email and we can continue the conversation here.

I wanted to follow up with those following along here.  The issue appears to be fixed.  I worked with Andy in email and the issue was, as far as I can tell, SQL's fault.  I've seen this kind of thing before.  What most likely happens is that when writing some critical info to a table, I turn around and query it too fast and SQL doesn't return it.  But If I wait for a couple secs then it works just fine.  The MC code in question works just fine.  It works on every box in my lab, and on all the customer boxes I've got it on.  But sometimes these glitches happen in SQL.


So I changed the way I was getting info from one routine to another to hopefully make it more stable.  So far it's looking good and that change will make it to the next release as well.

Login or Signup to post a comment