Oracle BRM: Are the processes of BRM getting hung frequently?

Are your processes frequently getting hung on BRM server?

Follow these steps to make sure these settings are properly configured.

Verify the locks on the database
Verify the output from SAM
Verify the CM connections using get_pstack
Verify the DM connectivity using the command

The following configurations handle the CM and DM processes more efficiently.

Try below CM/DM configuration settings when you identify many CM child processes which are running from a very long time.

Add the following to the CM's pin.conf file.

- cm die_on_exception 1

This will leave the default signal disposition for SIGSEGV, causing the CM to die rather than get stuck.

Add the following to the DM(dm_oracle) pin.conf file.

- dm die_on_exception 1

1. Verifying the locks on the database

Use similar SQL as below, may need to modify it accordingly.

set serveroutput on size unlimited
set feedback off
DECLARE
   v_num_sessions INTEGER := 0;
   CURSOR cv IS
SELECT dba_objects.object_name,
       locks_t.row#,
       locks_t.blocked_secs,
       locks_t.blocker_text,
       locks_t.blocked_text,
       locks_t.blocked_sql_text
  FROM (SELECT /*+ NO_MERGE */
               blocking_lock_session.username||'@'||blocking_lock_session.machine||'(SID='||blocking_lock_session.sid||') ['||
               blocking_lock_session.program||'/PID='||blocking_lock_session.process||']' as blocker_text,
               blocked_lock_session.username||'@'||blocked_lock_session.machine|| '(SID='||blocked_lock_session.sid||') ['||
               blocked_lock_session.program||'/PID='||blocked_lock_session.process||']' as blocked_text,
               blocked_lock_session.row_wait_obj#,
               blocked_lock_session.row_wait_file#,
               blocked_lock_session.row_wait_block#,
               blocked_lock_session.row_wait_row#,
               DBMS_ROWID.ROWID_CREATE (1,
                  blocked_lock_session.row_wait_obj#,
                  blocked_lock_session.row_wait_file#,
                  blocked_lock_session.row_wait_block#,
                  blocked_lock_session.row_wait_row#) row#,
               blocked_lock_session.seconds_in_wait blocked_secs,
               blocked_sql.sql_text blocked_sql_text
          FROM v$lock blocking_lock,
               v$session blocking_lock_session,
               v$lock blocked_lock,
               v$session blocked_lock_session,
               v$sql blocked_sql
         WHERE blocking_lock.block = 1
           AND blocking_lock.id1 = blocked_lock.id1
           AND blocking_lock.id2 = blocked_lock.id2
           AND blocked_lock.request > 0
           AND blocking_lock.sid = blocking_lock_session.sid
       AND blocked_lock.sid = blocked_lock_session.sid
           AND blocked_lock_session.sql_id = blocked_sql.sql_id
           AND blocked_lock_session.sql_child_number = blocked_sql.child_number
       ) locks_t,
       dba_objects
WHERE locks_t.row_wait_obj# = dba_objects.object_id
   AND locks_t.blocked_secs > &1
ORDER BY locks_t.blocked_secs;
BEGIN
   FOR cv_rec IN cv LOOP
      dbms_output.put_line(
         '========= $Revision: 1.4 $ ($Date: 2013/09/16 13:15:22 $) ===========');
      v_num_sessions := v_num_sessions + 1;
      dbms_output.put_line('Locked object : '||
         cv_rec.object_name);
      dbms_output.put_line('Locked row# : '||
         cv_rec.row#);
      dbms_output.put_line('Blocked for : '||
         cv_rec.blocked_secs||' seconds');
      dbms_output.put_line('Blocker info. : '||
         cv_rec.blocker_text);
      dbms_output.put_line('Blocked info. : '||
         cv_rec.blocked_text);
      dbms_output.put_line('Blocked SQL : '||
         cv_rec.blocked_sql_text);
   END LOOP;
   dbms_output.new_line;
   dbms_output.put_line('Found '||TO_CHAR(v_num_sessions)||
      ' blocked session(s).');
END;
/
exit;

Capture the output of the above to investigate further on the source of the locks on the BRM database.

Collect it from Infra:

Syslogs for the node which is down.

Collect it from DBA:

ASH (Active session history) report

AWR report

Database alert logsCapture the output of glance

Glance output is equivalent to “lsof” for Unix. Capture the glance output for all the pages.

The sample looks as below. Need to capture the Memory utilization and CPU usage in both the pages.

2. Verify SAM output from Unix OS

SAM output is available only to “root” user.

The purpose is to identify the below values for the OS.

HPUX specifics: (the UNIX system administrator will know the best settings for the environment):

maxfiles

Set soft limit for the number of files a process is allowed to have open simultaneously.

Acceptable Values:

Minimum: 30

Maximum: 60000

Default: 60

Description: maxfiles specifies the system default soft limit for the number of files a process is allowed to have open at any given time. It is possible for a process to increase its soft limit and therefore open more than maxfiles files.

maxfiles_lim

Set hard limit for number of files a process is allowed to have open simultaneously.

Acceptable Values:

Minimum: 30

Maximum: 60000

Default: 1024

Description: maxfiles_lim specifies the system default hard limit for the number of open files a process may have. It is possible for a non-superuser process to increase its soft limit (maxfiles) up to this hard limit.

nfile

Set maximum number of files that can be open simultaneously on the system at any given time.

Acceptable Values:

Minimum: 14

Maximum: Memory limited

Default: ((16*(Nproc+16+MaxUsers)/10)+32+2*(Npty+Nstrpty)

Description: nfile defines the maximum number files that can be open at any one time, system-wide. It is the number of slots in the file descriptor table. Be generous with this number because the required memory is minimal, and not having enough slots restricts system processing capacity.

nflocks

Specify the maximum combined total number of file locks that are available system-wide to all processes at any given time.

Acceptable Values:

Minimum: 2

Maximum: Memory limited

Default: 200

Description: nflocks gives the maximum number of file/record locks that are available system-wide. When choosing this number, note that one file may have several locks and databases that use lockf() may need an exceptionally large number of locks. Open and locked files consume memory and other system resources. These resources must be balanced against other system needs to maintain optimum overall system performance. Achieving an optimum balance can be quite complex, especially on large systems, because of wide variation in the kinds of applications being used on each system and the number and types of applications that might be running simultaneously, the number of local and/or remote users on the system, and many other factors.

nproc

nproc - number of processes

Acceptable Values:

Minimum: 10

Maximum: Memory limited

Default: 20+(8 * maxusers)

Description: nproc specifies the maximum total number of processes that can exist

simultaneously in the system. There are at least four system overhead processes at all times, and one entry is always reserved for the super-user. When the total number of processes in the system is larger than nproc, the system issues these messages:

At the system console:

proc: table is full

Also, if a user tries to start a new process from a shell, the following message prints on the user's terminal:

no more processes

If a user is executing fork() to create a new process, fork() returns -1 and sets errno to EAGAIN.

maxuprc

Set maximum number of simultaneous user processes.

Acceptable Values:

Minimum: 3

Maximum: Nproc-4

Default: 50

Description: maxuprc establishes the maximum number of simultaneous processes available to each user on the system. A user is identified by the user ID number, not by the number of login instances. Each user requires at least one process for the login shell, and additional processes for all other processes spawned in that process group. (the default is usually adequate). The super-user is exempt from this limit. Pipelines need at least one simultaneous process for each side of a |. Some commands, such as cc, fc, and pc, use more than one process per invocation. If a user attempts to start a new process that would cause the total number of processes for that user to exceed maxuprc, the system issues an error message to the user:

no more processes

If a user process executes a fork() system call to create a new process, causing the total number of processes for the user to exceed maxuprc, fork() returns -1 and sets errno to EAGAIN.

The difference between the soft and the hard limit is that a process can change its own soft limit up to the hard limit. It is recommended that both the nfile and maxfiles system parameters be increased.

IMPORTANT: Kernel parameter changes will not take effect until the system is rebooted.

3. Verify the CM status using pstack

The command "ps -ef | grep cm$" will provide the list of CM processes running. Use the command "pstack <cm_pid>" to get the current status of the CM. The output of this command will help to identify the point where the process got stuck.

4. Verify the status of the DM

Use the below command and capture the output of DM.

1. ps -ef | grep dm_oracle

2. Kill –USR1 <DM_pid>

When the DM (dm_oracle) process unable to fetch information from the database due to locks or wait events on the database then this step will be useful to identify the root of the problem.

If a batch process such as pin_bill_accts or pin_inv_accts is hung, then use the pstack command to get the current status of the process and analyze further from there.

Search This Blog

InVave Biling