EasyDarwin的ERROR日志导致重启

如题,如果系统出现了Error级别的LOG,那么会导致服务进程重启,下面主要讲述下机制:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
void Utility::MyLogErr(char *msg)
{
//qtss_printf(msg);
QTSServerInterface::LogError(qtssFatalVerbosity, msg);
}
void QTSServerInterface::LogError(QTSS_ErrorVerbosity inVerbosity, char* inBuffer)
{
QTSS_RoleParams theParams;
theParams.errorParams.inVerbosity = inVerbosity;
theParams.errorParams.inBuffer = inBuffer;
for (UInt32 x = 0; x < QTSServerInterface::GetNumModulesInRole(QTSSModule::kErrorLogRole); x++)
(void)QTSServerInterface::GetModule(QTSSModule::kErrorLogRole, x)->!CallDispatch(QTSS_ErrorLog_Role, &theParam);
// If this is a fatal error, set the proper attribute in the RTSPServer dictionary
if ((inVerbosity == qtssFatalVerbosity) && (sServer != NULL))
{
QTSS_ServerState theState = qtssFatalErrorState;
(void)sServer->SetValue(qtssSvrState, 0, &theState, sizeof(theState));
}
}

我们看看qtssSvrState这个参数的解释:

1
qtssSvrState = 8, r/w QTSS_ServerState The current state of the server. If a module sets the server state, the server will respond in the appropriate fashion. Setting to qtssRefusingConnectionsState causes the server to refuse connections, setting to qtssFatalErrorState or qtssShuttingDownState causes the server to quit.

这个参数是服务器状态参数,如果有某个模块设置了这个状态,服务器会以合适的方式回应,如果设置成了qtssRefusingConnectionsState状态,则服务器会拒绝连接;如果设置成了qtssFatalErrorState或者qtssShuttingDownState状态,则会导致服务器重启。
在main函数,也就是

1
2
3
4
5
6
7
8
9
10
11
12
int main(int argc, char * argv[]) 
{
...
//This function starts, runs, and shuts down the server
if (::StartServer(&theXMLParser, &theMessagesSource, thePort, statsUpdateInterval, theInitialState, dontFork, debugLevel, debugOptions) != qtssFatalErrorState)
{ ::RunServer();
CleanPid(false);
exit (EXIT_SUCCESS);
}
else
exit(-1); //Cant start server don't try again
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
void RunServer()
{
Bool16 restartServer = false;
UInt32 loopCount = 0;
UInt32 debugLevel = 0;
Bool16 printHeader = false;
Bool16 printStatus = false;
//just wait until someone stops the server or a fatal error occurs.
QTSS_ServerState theServerState = sServer->GetServerState();
while ((theServerState != qtssShuttingDownState) &&
(theServerState != qtssFatalErrorState))
{
#ifdef __sgi__
OSThread::Sleep(999);
#else
OSThread::Sleep(1000);
#endif
LogStatus(theServerState);
if (sStatusUpdateInterval)
{
debugLevel = sServer->GetDebugLevel();
printHeader = PrintHeader(loopCount);
printStatus = PrintLine(loopCount);

if (printStatus)
{
if (DebugOn(sServer) ) // debug level display or logging is on
DebugStatus(debugLevel, printHeader);

if (!DebugDisplayOn(sServer))
PrintStatus(printHeader); // default status output
}


loopCount++;
}

if ((sServer->SigIntSet()) || (sServer->SigTermSet()))
{
//
// start the shutdown process
theServerState = qtssShuttingDownState;
(void)QTSS_SetValue(QTSServerInterface::GetServer(), qtssSvrState, 0, &theServerState, sizeof(theServerState));
if (sServer->SigIntSet())
restartServer = true;
}

theServerState = sServer->GetServerState();
if (theServerState == qtssIdleState)
sServer->KillAllRTPSessions();
}

//
// Kill all the sessions and wait for them to die,
// but don't wait more than 5 seconds
sServer->KillAllRTPSessions();
for (UInt32 shutdownWaitCount = 0; (sServer->GetNumRTPSessions() > 0) && (shutdownWaitCount < 5); shutdownWaitCount++)
OSThread::Sleep(1000);

//Now, make sure that the server can't do any work
TaskThreadPool::RemoveThreads();

//now that the server is definitely stopped, it is safe to initate
//the shutdown process
delete sServer;

CleanPid(false);
//ok, we're ready to exit. If we're quitting because of some fatal error
//while running the server, make sure to let the parent process know by
//exiting with a nonzero status. Otherwise, exit with a 0 status
if (theServerState == qtssFatalErrorState || restartServer)
::exit (-2);//-2 signals parent process to restart server
}

在RunServer函数里,如果server的状态是qtssFatalErrorState,则会调用exit(-2),该进程会退出,那么在该进程又是如何重启的呢?
其实,在流媒体服务器启动后会在系统中创建两个进程,子进程服务流媒体服务,父进程负责对子进程进行管理,例如检查子进程是否退出,如果退出,则依据配置文件的配置而觉得是否重启子进程。
在main函数中,截获了系统信号SIGINT信号,如果截获到了这些信号(如下代码所示)则会调用到sigcatcher函数,该函数里会判断如果是SIGINT信号,则会kill子进程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
int main(int argc, char * argv[]) 
{
extern char* optarg;


// on write, don't send signal for SIGPIPE, just set errno to EPIPE
// and return -1
//signal is a deprecated and potentially dangerous function
//(void) ::signal(SIGPIPE, SIG_IGN);
struct sigaction act;

#if defined(sun) || defined(i386) || defined(__x86_64__) || defined (__MacOSX__) || defined(__powerpc__) || defined (__osf__) || defined (__sgi_cc__) || defined (__hpux__) || defined (__linux__)
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
act.sa_handler = (void(*)(int))&sigcatcher;
#elif defined(__sgi__)
sigemptyset(&act.sa_mask);
act.sa_flags = 0;
act.sa_handler = (void(*)(...))&sigcatcher;
#else
act.sa_mask = 0;
act.sa_flags = 0;
act.sa_handler = (void(*)(...))&sigcatcher;
#endif
(void)::sigaction(SIGPIPE, &act, NULL);
(void)::sigaction(SIGHUP, &act, NULL);
(void)::sigaction(SIGINT, &act, NULL);
(void)::sigaction(SIGTERM, &act, NULL);
(void)::sigaction(SIGQUIT, &act, NULL);
(void)::sigaction(SIGALRM, &act, NULL);
}

同时在main函数中还有以下一段代码,如果是父进程,则会阻塞在wait函数调用处,知道wait返回。wait函数的解释是:父进程一旦调用了wait就立即阻塞自己,由wait自动分析是否当前进程的某个子进程已经退出,如果让它找到了这样一个已经变成僵尸的子进程,wait就会收集这个子进程的信息,并把它彻底销毁后返回;如果没有找到这样一个子进程,wait就会一直阻塞在这里,直到有一个出现为止。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
int main(int argc, char * argv[]) 
{
...
int status = 0;
int pid = 0;
pid_t processID = 0;

if ( !dontFork) // if (fork)
{
//loop until the server exits normally. If the server doesn't exit
//normally, then restart it.
// normal exit means the following
// the child quit
do // fork at least once but stop on the status conditions returned by wait or if autoStart pref is false
{
processID = fork();
Assert(processID >= 0);
if (processID > 0) // this is the parent and we have a child
{
sChildPID = processID;
status = 0;
while (status == 0) //loop on wait until status is != 0;
{
pid =::wait(&status);
SInt8 exitStatus = (SInt8) WEXITSTATUS(status);
//qtss_printf("Child Process %d wait exited with pid=%d status=%d exit status=%d\n", processID, pid, status, exitStatus);

if (WIFEXITED(status) && pid > 0 && status != 0) // child exited with status -2 restart or -1 don't restart
{
//qtss_printf("child exited with status=%d\n", exitStatus);

if ( exitStatus == -1) // child couldn't run don't try again
{
qtss_printf("child exited with -1 fatal error so parent is exiting too.\n");
exit (EXIT_FAILURE);
}
break; // restart the child

}

if (WIFSIGNALED(status)) // child exited on an unhandled signal (maybe a bus error or seg fault)
{
//qtss_printf("child was signalled\n");
break; // restart the child
}

if (pid == -1 && status == 0) // parent woken up by a handled signal
{
//qtss_printf("handled signal continue waiting\n");
continue;
}

if (pid > 0 && status == 0)
{
//qtss_printf("child exited cleanly so parent is exiting\n");
exit(EXIT_SUCCESS);
}

//qtss_printf("child died for unknown reasons parent is exiting\n");
exit (EXIT_FAILURE);
}
}
else if (processID == 0) // must be the child
break;
else
exit(EXIT_FAILURE);


//eek. If you auto-restart too fast, you might start the new one before the OS has
//cleaned up from the old one, resulting in startup errors when you create the new
//one. Waiting for a second seems to work
sleep(1);
} while (RestartServer(theXMLFilePath)); // fork again based on pref if server dies
if (processID != 0) //the parent is quitting
exit(EXIT_SUCCESS);

}
...
}