|
1. Absolute Deadlock Protection
|
|
Sat Jan 17, 2004 [8:06 PM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
Reply
|
Has anybody implemented absolute deadlock protection? If so, how? If not, I have a working peice of code I could snippitize that has been proven to destroy deadlocks. Its only known downside is the requirement of posix threads
To put it mildly, its not award winning. but by adding two functions (one to game_loop), a 2nd thread continues to monitor the game for lockups. Its simplistic (less risky), and not terribly overbearing on the system, yet i can't say for sure what its usage is.
all I'd ask is that my name remains in that peice of code. simple enough. people like respect. makes them friendlier. Any questions? Have I been beaten to the punch? I'd love feedback.
|
|
|
|
|
2. RE: A better brain may be required, I'm afraid
|
|
Sat Jan 17, 2004 [10:49 PM]
|
muir
tmc-mailMIAUelvendesignsMIAUcom
member since: Sep 14, 2003
|
In Reply To
Reply
|
|
Destroying deadlocks has been done. Avoiding them, now that's a task for you. Nevertheless, I'd like to see what you cooked up :)
.
|
|
|
|
|
3. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [1:18 AM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
In Reply To
Reply
|
Agreed. I should have remained quiet. oh well. too late for that this time.
/* --------- probably existing code somewhere --------- */
int pmPulse, pmRun;
pthread_t pmChild;
int pmUpdate()
{
struct timeval timewait;
while(!mud_down)
{
timewait.tv_sec = 15;
timewait.tv_usec = 0;
select(0, NULL, NULL, NULL, &timewait);
if( pmPulse == 1 )
{
exit(-1);
}
else
{ pmPulse = 1; }
}
pmRun = 0;
pthread_exit(0);
}
int pmStart()
{
if(pmRun == 1) return -1;
pthread_create(&pmChild, NULL, pmUpdate, NULL);
pmRun = 1;
return 0;
}
int pmStop()
{
if(pmRun == 0) return -1;
pthread_cancel(&pmChild);
pmRun=0;
return 0;
}
int pmReset()
{
pmPulse = 0;
return 0;
}
/* --------- end of probably existing code somewhere --------- */
Thats it. some careful linking, and you just have to call pmStart to arm it and pmReset within under 15 seconds. game_loop was perfect for it. And with that, I shall return to silence.
|
|
|
|
|
4. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [2:42 AM]
|
scandum
Email not supplied
member since: Aug 30, 2002
|
In Reply To
Reply
|
|
How about fixing the lockup whenever one occurs? Works for me.
|
|
|
|
|
5. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [8:26 AM]
|
Samson
Email not supplied
member since: Jul 24, 1999
|
In Reply To
Reply
|
Alternative method using signals - used by stock Smaug in new_descriptor: Top of game_loop, with other signal stuff: signal( SIGALRM, caught_alarm ); In game_loop, after the new_descriptor call: set_alarm( 30 ); And again at the bottom of game_loop, to reset it ( not sure if this is needed ): set_alarm( 0 ); The set_alarm function itself, which is just a wrapper: void set_alarm( long seconds ) { alarm( seconds ); } And caught_alarm:
/*
* LAG alarm! -Thoric
*/
static void caught_alarm( int signum )
{
bug( 'ALARM CLOCK! In section %s', alarm_section );
echo_to_all( AT_IMMORT, 'Alas, the hideous malevalent entity known only as 'Lag' rises once more!', ECHOTAR_IMM );
if( newdesc )
{
FD_CLR( newdesc, &in_set );
FD_CLR( newdesc, &out_set );
FD_CLR( newdesc, &exc_set );
log_string( 'clearing newdesc' );
}
game_loop();
/* Clean up the loose ends. */
close_mud( );
/*
* That's all, folks.
*/
log_string( 'Normal termination of game.' );
log_string( 'Cleaning up Memory.&d' );
cleanup_memory();
exit( 0 );
}
I've tested this out with a deliberate infinite loop command, it caught it and wa able to restart game_loop without threads. Smaug does not handle this use of the alarm stock, I added that part, but they were already using it in new_descriptor to break up connection problems. So I figured why not?
|
|
|
|
|
6. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [1:12 PM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
In Reply To
Reply
|
|
Quite interesting. I've looked at the smaug code before, but I've never quite seen the alarm code do anything at all, even with the lag alert. but you have a veny nice system there.
|
|
|
|
|
7. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [1:15 PM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
In Reply To
Reply
|
|
I cannot manipulate the main thread as well as I would like. so quite simply, this works just as well. if it locks up, it dies. then the reboot script can take care of bringing it back up. Its better then a 3+ hour lockup.
|
|
|
|
|
8. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [2:28 PM]
|
muir
tmc-mailMIAUelvendesignsMIAUcom
member since: Sep 14, 2003
|
In Reply To
Reply
|
>I cannot manipulate the main thread as well as I would like. How about..
struct deadlock_spot
{
// In use for N ticks
long stagnant_ticks;
// Contended variables
var variables[];
// Reset values
val resets[];
};
// global
deadlock_spot possible_deadlocks[];
void deadlock_break()
{
...
// if something deadlocked
// find in poss_deadlocks the item that
// has been stagnant for more than MAX_STAGNANT ticks
// reset variables
// proceed
...
}
Although what I think scandum meant is you shouldn't allow your code to deadlock, so instead of fixing runtime you fix before compiling :) .
|
|
|
|
|
9. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [2:48 PM]
|
scandum
Email not supplied
member since: Aug 30, 2002
|
In Reply To
Reply
|
|
doesn't gdb allow debugging dead locks?
(Comment added by scandum on Sun Jan 18 17:20:21 2004)
well, that'd be lockups, recalling gdb doesn't work when threading.
|
|
|
|
|
10. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [2:49 PM]
|
Tyche
Email not supplied
member since: Apr 4, 2000
|
In Reply To
Reply
|
|
Actually I became confused when I saw the term deadlock. A deadlock is when one thread/process is waiting on a resource held by another thread/process. A thread/process in an infinite loop or taking to much time is not a deadlock.
Your code contains a race condition with the global variable pmPulse. It's of little practical concern in this particular case. But it could be set to 1 after pmReset has been entered.
|
|
|
|
|
12. RE: A better brain may be required, I'm afraid
|
|
Sun Jan 18, 2004 [9:59 PM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
In Reply To
Reply
|
|
My apologies for the confusion.
As for fixing it before compilation, I'd love to. but until I know that something was locking up I can't fix it. and I can't watch it 24/7 for lockups.
And yes, I'm aware for the race condition. I'm glad you spotted it too. but if the mud is set to execute game_loop at 4 loops per second or so, it will have somewhere around 60 chances to set pmPulse back to 0 before the lockup timer makes a check. If you were to lower the time drastically, then yes, a mutex may become neccesary.
|
|
|
|
|
13. RE: Absolute Deadlock Protection
|
|
Mon Jan 19, 2004 [1:00 AM]
|
Fredrik
Email not supplied
member since: Feb 15, 2000
|
In Reply To
Reply
|
|
Just to clarify: As someone else already has stated, it appears that you're looking for infinite loop protection rather than deadlock protection.
But I'll answer the deadlock bit anyway. The best way I've found to avoid deadlocks is to have each thread or process always obtain all mutexes in the same order. That way, deadlocks won't happen.
/Fredrik
|
|
|
|
|
14. RE: Topic: Rephrased for (hopefully) better understanding.
|
|
Mon Jan 19, 2004 [2:23 AM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
In Reply To
Reply
|
|
..and to clarify I'm a moron: The rephrasal: I was talking about lockups. Using the 2nd thread it recovers the system from its infinete loop by forcing a very hostile shutdown so that the game may reboot rather then sit and wait for god only knows how long.
It was meant as a simple addition. It took 30 miniutes to write and implement. It's not very advanced. Thats why I posted on this board. Its simplistic. It works. The watch thread has minimal risk. No reason to overcomplicate something that works pretty dang well as is.
I know there is a race condition. I know I said the wrong thing. I apoligize for the confusion and frustration. It needs not be clarified. thanks anyways.
I know its better to fix the problem then avoid it. But again, sometimes the problem isn't obvious until it takes the mud out of commission[reboot/lockup](as errors in c/c++ tend to do). When that happans, I'll go fix it. until I know that something is wrong, I can't bloody well fix it.
As for deadlocks, good advice with mutexs, but I've already found that out. thanks anyways.
|
|
|
|
|
15. RE: A better brain may be required, I'm afraid
|
|
Mon Jan 19, 2004 [4:47 AM]
|
Samson
Email not supplied
member since: Jul 24, 1999
|
In Reply To
Reply
|
|
Yeah, it never fires off in new_descriptor anymore. Not since adding a DNS resolver someone sent me. So it had been forgotten for quite some time until someone on TMS happened to be talking about ways to stop infinite loops and another person suggested the set_alarm thing. I was reading the thread and figured why not? Worth a shot.
|
|
|
|
|
16. RE: Topic: Rephrased for (hopefully) better understanding.
|
|
Mon Jan 19, 2004 [9:50 AM]
|
Tyche
Email not supplied
member since: Apr 4, 2000
|
In Reply To
Reply
|
|
I know its better to fix the problem then avoid it. But again, sometimes the problem isn't obvious until it takes the mud out of commission...
It might be better to use abort() or kill(foo,bar) or whatever it takes to get a core dump in your termination routine. That way you can view the stack frame, if it hasn't been corrupted, of the main thread and find out where it was at.
|
|
|
|
|
17. RE: Topic: Rephrased for (hopefully) better understanding.
|
|
Mon Jan 19, 2004 [9:59 AM]
|
Tharn
Email not supplied
member since: Aug 31, 2002
|
In Reply To
Reply
|
|
Good thought. I was wondering how I might get a core dump if it doesn't crash. it abort() or kill() does that, then thanks for the tip.
|
|
|
|
|
18. RE: Topic: Rephrased for (hopefully) better understanding.
|
|
Mon Jan 19, 2004 [10:55 AM]
|
jobo
Email not supplied
member since: May 25, 2000
|
In Reply To
Reply
|
|
If you want to create a coredump, you could raise(3) a signal, for instance a SIGSEGV. If your MUD freezes, you could also attach gdb directly to the running process, and do a backtrace from there.
Brian
|
|
|
|
|
19. RE: Topic: Rephrased for (hopefully) better understanding.
|
|
Mon Jan 19, 2004 [1:03 PM]
|
Tyche
Email not supplied
member since: Apr 4, 2000
|
In Reply To
Reply
|
|
I think abort() raises the signal SIGABRT which, if not caught and handled by the application already, will produce a core dump assuming the rest of the criteria for core dumping are met (i.e. rlimit, writable directory, etc.). I could be wrong, but there's a sure way to find out.
|
|
|
|
|