Oracle BRM: Are the processes of BRM getting hung frequently?
Are your processes frequently getting hung on BRM server?
Follow these steps to make sure these settings are properly configured.
Verify the locks on the database
Verify the output from SAM
Verify the CM connections using get_pstack
Verify the DM connectivity using the command
Try below CM/DM configuration settings when you identify many CM child processes which are running from a very long time.
Add the following to the CM's pin.conf file.
- cm die_on_exception 1
This will leave the default signal disposition for SIGSEGV, causing the CM to die rather than get stuck.
Add the following to the DM(dm_oracle) pin.conf file.
- dm die_on_exception 1
1. Verifying the locks on the database
set serveroutput on size unlimited
set feedback off
DECLARE
v_num_sessions INTEGER := 0;
CURSOR cv IS
SELECT dba_objects.object_name,
locks_t.row#,
locks_t.blocked_secs,
locks_t.blocker_text,
locks_t.blocked_text,
locks_t.blocked_sql_text
FROM (SELECT /*+ NO_MERGE */
blocking_lock_session.username||'@'||blocking_lock_session.machine||'(SID='||blocking_lock_session.sid||') ['||
blocking_lock_session.program||'/PID='||blocking_lock_session.process||']' as blocker_text,
blocked_lock_session.username||'@'||blocked_lock_session.machine|| '(SID='||blocked_lock_session.sid||') ['||
blocked_lock_session.program||'/PID='||blocked_lock_session.process||']' as blocked_text,
blocked_lock_session.row_wait_obj#,
blocked_lock_session.row_wait_file#,
blocked_lock_session.row_wait_block#,
blocked_lock_session.row_wait_row#,
DBMS_ROWID.ROWID_CREATE (1,
blocked_lock_session.row_wait_obj#,
blocked_lock_session.row_wait_file#,
blocked_lock_session.row_wait_block#,
blocked_lock_session.row_wait_row#) row#,
blocked_lock_session.seconds_in_wait blocked_secs,
blocked_sql.sql_text blocked_sql_text
FROM v$lock blocking_lock,
v$session blocking_lock_session,
v$lock blocked_lock,
v$session blocked_lock_session,
v$sql blocked_sql
WHERE blocking_lock.block = 1
AND blocking_lock.id1 = blocked_lock.id1
AND blocking_lock.id2 = blocked_lock.id2
AND blocked_lock.request > 0
AND blocking_lock.sid = blocking_lock_session.sid
AND blocked_lock.sid = blocked_lock_session.sid
AND blocked_lock_session.sql_id = blocked_sql.sql_id
AND blocked_lock_session.sql_child_number = blocked_sql.child_number
) locks_t,
dba_objects
WHERE locks_t.row_wait_obj# = dba_objects.object_id
AND locks_t.blocked_secs > &1
ORDER BY locks_t.blocked_secs;
BEGIN
FOR cv_rec IN cv LOOP
dbms_output.put_line(
'========= $Revision: 1.4 $ ($Date: 2013/09/16 13:15:22 $) ===========');
v_num_sessions := v_num_sessions + 1;
dbms_output.put_line('Locked object : '||
cv_rec.object_name);
dbms_output.put_line('Locked row# : '||
cv_rec.row#);
dbms_output.put_line('Blocked for : '||
cv_rec.blocked_secs||' seconds');
dbms_output.put_line('Blocker info. : '||
cv_rec.blocker_text);
dbms_output.put_line('Blocked info. : '||
cv_rec.blocked_text);
dbms_output.put_line('Blocked SQL : '||
cv_rec.blocked_sql_text);
END LOOP;
dbms_output.new_line;
dbms_output.put_line('Found '||TO_CHAR(v_num_sessions)||
' blocked session(s).');
END;
/
exit;
Capture the output of the above to investigate further on the source of the locks on the BRM database.
Collect it from Infra:
Syslogs for the node which is down.
Collect it from DBA:
ASH (Active session history) report
AWR report
Database alert logsCapture the output of glance
Glance output is equivalent to “lsof” for Unix. Capture the glance output for all the pages.
The sample looks as below. Need to capture the Memory utilization and CPU usage in both the pages.
2. Verify SAM output from Unix OS
SAM output is available only to “root” user.
The purpose is to identify the below values for the OS.
HPUX specifics: (the UNIX system administrator will know the best settings for the environment):
maxfiles
Set soft limit for the number of files a process is allowed to have open simultaneously.
Acceptable Values:
Minimum: 30
Maximum: 60000
Default: 60
Description: maxfiles specifies the system default soft limit for the number of files a process is allowed to have open at any given time. It is possible for a process to increase its soft limit and therefore open more than maxfiles files.
maxfiles_lim
Set hard limit for number of files a process is allowed to have open simultaneously.
Acceptable Values:
Minimum: 30
Maximum: 60000
Default: 1024
Description: maxfiles_lim specifies the system default hard limit for the number of open files a process may have. It is possible for a non-superuser process to increase its soft limit (maxfiles) up to this hard limit.
nfile
Set maximum number of files that can be open simultaneously on the system at any given time.
Acceptable Values:
Minimum: 14
Maximum: Memory limited
Default: ((16*(Nproc+16+MaxUsers)/10)+32+2*(Npty+Nstrpty)
Description: nfile defines the maximum number files that can be open at any one time, system-wide. It is the number of slots in the file descriptor table. Be generous with this number because the required memory is minimal, and not having enough slots restricts system processing capacity.
nflocks
Specify the maximum combined total number of file locks that are available system-wide to all processes at any given time.
Acceptable Values:
Minimum: 2
Maximum: Memory limited
Default: 200
Description: nflocks gives the maximum number of file/record locks that are available system-wide. When choosing this number, note that one file may have several locks and databases that use lockf() may need an exceptionally large number of locks. Open and locked files consume memory and other system resources. These resources must be balanced against other system needs to maintain optimum overall system performance. Achieving an optimum balance can be quite complex, especially on large systems, because of wide variation in the kinds of applications being used on each system and the number and types of applications that might be running simultaneously, the number of local and/or remote users on the system, and many other factors.
nproc
nproc - number of processes
Acceptable Values:
Minimum: 10
Maximum: Memory limited
Default: 20+(8 * maxusers)
Description: nproc specifies the maximum total number of processes that can exist
simultaneously in the system. There are at least four system overhead processes at all times, and one entry is always reserved for the super-user. When the total number of processes in the system is larger than nproc, the system issues these messages:
At the system console:
proc: table is full
Also, if a user tries to start a new process from a shell, the following message prints on the user's terminal:
no more processes
If a user is executing fork() to create a new process, fork() returns -1 and sets errno to EAGAIN.
maxuprc
Set maximum number of simultaneous user processes.
Acceptable Values:
Minimum: 3
Maximum: Nproc-4
Default: 50
Description: maxuprc establishes the maximum number of simultaneous processes available to each user on the system. A user is identified by the user ID number, not by the number of login instances. Each user requires at least one process for the login shell, and additional processes for all other processes spawned in that process group. (the default is usually adequate). The super-user is exempt from this limit. Pipelines need at least one simultaneous process for each side of a |. Some commands, such as cc, fc, and pc, use more than one process per invocation. If a user attempts to start a new process that would cause the total number of processes for that user to exceed maxuprc, the system issues an error message to the user:
no more processes
If a user process executes a fork() system call to create a new process, causing the total number of processes for the user to exceed maxuprc, fork() returns -1 and sets errno to EAGAIN.
The difference between the soft and the hard limit is that a process can change its own soft limit up to the hard limit. It is recommended that both the nfile and maxfiles system parameters be increased.
IMPORTANT: Kernel parameter changes will not take effect until the system is rebooted.
3. Verify the CM status using pstack
The command "ps -ef | grep cm$" will provide the list of CM processes running. Use the command "pstack <cm_pid>" to get the current status of the CM. The output of this command will help to identify the point where the process got stuck.
4. Verify the status of the DM
Use the below command and capture the output of DM.
1. ps -ef | grep dm_oracle
2. Kill –USR1 <DM_pid>
When the DM (dm_oracle) process unable to fetch information from the database due to locks or wait events on the database then this step will be useful to identify the root of the problem.
If a batch process such as pin_bill_accts or pin_inv_accts is hung, then use the pstack command to get the current status of the process and analyze further from there.
Comments
Post a Comment