Please check out Azereth !

Member Discussions

terms



[Previous] [Next] [Post] [Reply] [Topics] [Summary] [Search]


1. Absolute Deadlock Protection Sat Jan 17, 2004 [8:06 PM]
Tharn
Email not supplied
member since: Aug 31, 2002
Reply
Has anybody implemented absolute deadlock protection? If so, how? If not, I have a working peice of code I could snippitize that has been proven to destroy deadlocks. Its only known downside is the requirement of posix threads

To put it mildly, its not award winning. but by adding two functions (one to game_loop), a 2nd thread continues to monitor the game for lockups. Its simplistic (less risky), and not terribly overbearing on the system, yet i can't say for sure what its usage is.

all I'd ask is that my name remains in that peice of code. simple enough. people like respect. makes them friendlier. Any questions? Have I been beaten to the punch? I'd love feedback.


2. RE: A better brain may be required, I'm afraid Sat Jan 17, 2004 [10:49 PM]
muir
tmc-mailMIAUelvendesignsMIAUcom
member since: Sep 14, 2003
In Reply To
Reply
Destroying deadlocks has been done. Avoiding them, now that's a task for you. Nevertheless, I'd like to see what you cooked up :)

.


3. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [1:18 AM]
Tharn
Email not supplied
member since: Aug 31, 2002
In Reply To
Reply
Agreed. I should have remained quiet. oh well. too late for that this time.
/* ---------  probably existing code somewhere --------- */
int pmPulse, pmRun;
pthread_t    pmChild;

int pmUpdate()
{ 
 struct timeval timewait;

 while(!mud_down)
 {
   timewait.tv_sec  = 15;
   timewait.tv_usec = 0;
   select(0, NULL, NULL, NULL, &timewait);
  
   if( pmPulse == 1 )
   { 
    exit(-1);
   } 
   else
   { pmPulse = 1; }

 }
 
 pmRun = 0;
 pthread_exit(0);
}
int pmStart()
{
  if(pmRun == 1) return -1;
  pthread_create(&pmChild, NULL, pmUpdate, NULL);
  pmRun = 1;
  return 0;
}
int pmStop()
{
 if(pmRun == 0) return -1;
 pthread_cancel(&pmChild);
 pmRun=0;
 return 0;
}
int pmReset()
{
 pmPulse = 0;
 return 0;
}
/* --------- end of probably existing code somewhere --------- */

Thats it. some careful linking, and you just have to call pmStart to arm it and pmReset within under 15 seconds. game_loop was perfect for it.

And with that, I shall return to silence.


4. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [2:42 AM]
scandum
Email not supplied
member since: Aug 30, 2002
In Reply To
Reply
How about fixing the lockup whenever one occurs? Works for me.
http://tintin.sf.net - Kickin It Old Skool since 1992


5. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [8:26 AM]
Samson
Email not supplied
member since: Jul 24, 1999
In Reply To
Reply
Alternative method using signals - used by stock Smaug in new_descriptor:

Top of game_loop, with other signal stuff:

signal( SIGALRM, caught_alarm );

In game_loop, after the new_descriptor call:

set_alarm( 30 );

And again at the bottom of game_loop, to reset it ( not sure if this is needed ):

set_alarm( 0 );

The set_alarm function itself, which is just a wrapper:

void set_alarm( long seconds )
{
alarm( seconds );
}

And caught_alarm:

/*
 * LAG alarm!					-Thoric
 */
static void caught_alarm( int signum )
{
   bug( 'ALARM CLOCK! In section %s', alarm_section );
   echo_to_all( AT_IMMORT, 'Alas, the hideous malevalent entity known only as 'Lag' rises once more!', ECHOTAR_IMM );
   if( newdesc )
   {
	FD_CLR( newdesc, &in_set );
	FD_CLR( newdesc, &out_set );
	FD_CLR( newdesc, &exc_set );
	log_string( 'clearing newdesc' );
   }

   game_loop();

   /* Clean up the loose ends. */
   close_mud( );

   /*
    * That's all, folks.
    */
   log_string( 'Normal termination of game.' );
   log_string( 'Cleaning up Memory.&d' );
   cleanup_memory();
   exit( 0 );
}


I've tested this out with a deliberate infinite loop command, it caught it and wa able to restart game_loop without threads. Smaug does not handle this use of the alarm stock, I added that part, but they were already using it in new_descriptor to break up connection problems. So I figured why not?
SmaugMuds.org: http://www.smaugmuds.org
My Blog. Leave your political correctness at the door: http://www.iguanadons.net


6. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [1:12 PM]
Tharn
Email not supplied
member since: Aug 31, 2002
In Reply To
Reply
Quite interesting. I've looked at the smaug code before, but I've never quite seen the alarm code do anything at all, even with the lag alert. but you have a veny nice system there.


7. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [1:15 PM]
Tharn
Email not supplied
member since: Aug 31, 2002
In Reply To
Reply
I cannot manipulate the main thread as well as I would like. so quite simply, this works just as well. if it locks up, it dies. then the reboot script can take care of bringing it back up. Its better then a 3+ hour lockup.


8. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [2:28 PM]
muir
tmc-mailMIAUelvendesignsMIAUcom
member since: Sep 14, 2003
In Reply To
Reply
>I cannot manipulate the main thread as well as I would like.

How about..

struct deadlock_spot
{
   // In use for N ticks
   long stagnant_ticks;

   // Contended variables
   var variables[];

   // Reset values
   val resets[];
};

// global
deadlock_spot possible_deadlocks[];

void deadlock_break()
{
    ...
    // if something deadlocked

    // find in poss_deadlocks the item that
    // has been stagnant for more than MAX_STAGNANT ticks

    // reset variables

    // proceed
    ...
}

Although what I think scandum meant is you shouldn't allow your code to deadlock, so instead of fixing runtime you fix before compiling :)

.


9. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [2:48 PM]
scandum
Email not supplied
member since: Aug 30, 2002
In Reply To
Reply
doesn't gdb allow debugging dead locks?

(Comment added by scandum on Sun Jan 18 17:20:21 2004)

well, that'd be lockups, recalling gdb doesn't work when
threading.
http://tintin.sf.net - Kickin It Old Skool since 1992


10. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [2:49 PM]
Tyche
Email not supplied
member since: Apr 4, 2000
In Reply To
Reply
Actually I became confused when I saw the term deadlock. A deadlock is when one thread/process is waiting on a resource held by another thread/process. A thread/process in an infinite loop or taking to much time is not a deadlock.

Your code contains a race condition with the global variable pmPulse. It's of little practical concern in this particular case. But it could be set to 1 after pmReset has been entered.
The Sourcery - http://sourcery.dyndns.org
TeensyMud - http://teensymud.kicks-ass.org
"A man can receive nothing, except it be given him from heaven."


11. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [4:02 PM]
muir
tmc-mailMIAUelvendesignsMIAUcom
member since: Sep 14, 2003
In Reply To
Reply
What you're saying is pmPulse could use a mutex?

.


12. RE: A better brain may be required, I'm afraid Sun Jan 18, 2004 [9:59 PM]
Tharn
Email not supplied
member since: Aug 31, 2002
In Reply To
Reply
My apologies for the confusion.

As for fixing it before compilation, I'd love to. but until I know that something was locking up I can't fix it. and I can't watch it 24/7 for lockups.

And yes, I'm aware for the race condition. I'm glad you spotted it too. but if the mud is set to execute game_loop
at 4 loops per second or so, it will have somewhere around 60 chances to set pmPulse back to 0 before the lockup timer
makes a check. If you were to lower the time drastically, then yes, a mutex may become neccesary.


13. RE: Absolute Deadlock Protection Mon Jan 19, 2004 [1:00 AM]
Fredrik
Email not supplied
member since: Feb 15, 2000
In Reply To
Reply
Just to clarify: As someone else already has stated, it appears that you're looking for infinite loop protection rather than deadlock protection.

But I'll answer the deadlock bit anyway. The best way I've found to avoid deadlocks is to have each thread or process always obtain all mutexes in the same order. That way, deadlocks won't happen.

/Fredrik


14. RE: Topic: Rephrased for (hopefully) better understanding. Mon Jan 19, 2004 [2:23 AM]
Tharn
Email not supplied
member since: Aug 31, 2002
In Reply To
Reply
..and to clarify I'm a moron: The rephrasal: I was talking about lockups. Using the 2nd
thread it recovers the system from its infinete loop by forcing a very hostile shutdown so that
the game may reboot rather then sit and wait for god only knows how long.

It was meant as a simple addition. It took 30 miniutes to write and implement. It's not very
advanced. Thats why I posted on this board. Its simplistic. It works. The watch thread has
minimal risk. No reason to overcomplicate something that works pretty dang well as is.

I know there is a race condition. I know I said the wrong thing. I apoligize for the confusion
and frustration. It needs not be clarified. thanks anyways.

I know its better to fix the problem then avoid it. But again, sometimes the problem isn't obvious
until it takes the mud out of commission[reboot/lockup](as errors in c/c++ tend to do). When that
happans, I'll go fix it. until I know that something is wrong, I can't bloody well fix it.

As for deadlocks, good advice with mutexs, but I've already found that out. thanks anyways.


15. RE: A better brain may be required, I'm afraid Mon Jan 19, 2004 [4:47 AM]
Samson
Email not supplied
member since: Jul 24, 1999
In Reply To
Reply
Yeah, it never fires off in new_descriptor anymore. Not since adding a DNS resolver someone sent me. So it had been forgotten for quite some time until someone on TMS happened to be talking about ways to stop infinite loops and another person suggested the set_alarm thing. I was reading the thread and figured why not? Worth a shot.
SmaugMuds.org: http://www.smaugmuds.org
My Blog. Leave your political correctness at the door: http://www.iguanadons.net


16. RE: Topic: Rephrased for (hopefully) better understanding. Mon Jan 19, 2004 [9:50 AM]
Tyche
Email not supplied
member since: Apr 4, 2000
In Reply To
Reply
I know its better to fix the problem then avoid it. But again, sometimes the problem isn't obvious
until it takes the mud out of commission...


It might be better to use abort() or kill(foo,bar) or whatever it takes to get a core dump in your termination routine. That way you can view the stack frame, if it hasn't been corrupted, of the main thread and find out where it was at.
The Sourcery - http://sourcery.dyndns.org
TeensyMud - http://teensymud.kicks-ass.org
"A man can receive nothing, except it be given him from heaven."


17. RE: Topic: Rephrased for (hopefully) better understanding. Mon Jan 19, 2004 [9:59 AM]
Tharn
Email not supplied
member since: Aug 31, 2002
In Reply To
Reply
Good thought. I was wondering how I might get a core dump if it doesn't crash. it abort() or kill() does that, then thanks for the tip.


18. RE: Topic: Rephrased for (hopefully) better understanding. Mon Jan 19, 2004 [10:55 AM]
jobo
Email not supplied
member since: May 25, 2000
In Reply To
Reply
If you want to create a coredump, you could raise(3) a signal, for instance a SIGSEGV. If your MUD freezes, you could also attach gdb directly to the running process, and do a backtrace from there.

Brian


19. RE: Topic: Rephrased for (hopefully) better understanding. Mon Jan 19, 2004 [1:03 PM]
Tyche
Email not supplied
member since: Apr 4, 2000
In Reply To
Reply
I think abort() raises the signal SIGABRT which, if not caught and handled by the application already, will produce a core dump assuming the rest of the criteria for core dumping are met (i.e. rlimit, writable directory, etc.). I could be wrong, but there's a sure way to find out.
The Sourcery - http://sourcery.dyndns.org
TeensyMud - http://teensymud.kicks-ass.org
"A man can receive nothing, except it be given him from heaven."




[Previous] [Next] [Post] [Reply] [Topics] [Summary] [Search]