Oracle BRM: Are the processes of BRM getting hung frequently?

Are your processes frequently getting hung on BRM server?

Follow these steps to make sure these settings are properly configured.


The following configurations handle the CM and DM processes more efficiently.

Try below CM/DM configuration settings when you identify many CM child processes which are running from a very long time.

Add the following to the CM's pin.conf file.

- cm die_on_exception 1

This will leave the default signal disposition for SIGSEGV, causing the CM to die rather than get stuck.

Add the following to the DM(dm_oracle) pin.conf file.

- dm die_on_exception 1

1. Verifying the locks on the database

Use similar SQL as below, may need to modify it accordingly.

set serveroutput on size unlimited

set feedback off 

DECLARE

   v_num_sessions INTEGER := 0;

   CURSOR cv IS

SELECT dba_objects.object_name,

       locks_t.row#,

       locks_t.blocked_secs,

       locks_t.blocker_text,

       locks_t.blocked_text,

       locks_t.blocked_sql_text

  FROM (SELECT /*+ NO_MERGE */

               blocking_lock_session.username||'@'||blocking_lock_session.machine||'(SID='||blocking_lock_session.sid||') ['||

               blocking_lock_session.program||'/PID='||blocking_lock_session.process||']' as blocker_text,

               blocked_lock_session.username||'@'||blocked_lock_session.machine|| '(SID='||blocked_lock_session.sid||') ['||

               blocked_lock_session.program||'/PID='||blocked_lock_session.process||']' as blocked_text,

               blocked_lock_session.row_wait_obj#,

               blocked_lock_session.row_wait_file#,

               blocked_lock_session.row_wait_block#,

               blocked_lock_session.row_wait_row#,

               DBMS_ROWID.ROWID_CREATE (1,

                  blocked_lock_session.row_wait_obj#,

                  blocked_lock_session.row_wait_file#,

                  blocked_lock_session.row_wait_block#,

                  blocked_lock_session.row_wait_row#) row#,

               blocked_lock_session.seconds_in_wait blocked_secs,

               blocked_sql.sql_text blocked_sql_text

          FROM v$lock blocking_lock,

               v$session blocking_lock_session,

               v$lock blocked_lock,

               v$session blocked_lock_session,

               v$sql blocked_sql

         WHERE blocking_lock.block = 1

           AND blocking_lock.id1 = blocked_lock.id1

           AND blocking_lock.id2 = blocked_lock.id2

           AND blocked_lock.request > 0

           AND blocking_lock.sid = blocking_lock_session.sid

       AND blocked_lock.sid = blocked_lock_session.sid

           AND blocked_lock_session.sql_id = blocked_sql.sql_id

           AND blocked_lock_session.sql_child_number = blocked_sql.child_number

       ) locks_t,

       dba_objects

 WHERE locks_t.row_wait_obj# = dba_objects.object_id

   AND locks_t.blocked_secs > &1

ORDER BY locks_t.blocked_secs;

BEGIN

   FOR cv_rec IN cv LOOP

      dbms_output.put_line(

         '========= $Revision: 1.4 $ ($Date: 2013/09/16 13:15:22 $) ===========');

      v_num_sessions := v_num_sessions + 1;

      dbms_output.put_line('Locked object : '||

         cv_rec.object_name);

      dbms_output.put_line('Locked row#   : '||

         cv_rec.row#);

      dbms_output.put_line('Blocked for   : '||

         cv_rec.blocked_secs||' seconds');

      dbms_output.put_line('Blocker info. : '||

         cv_rec.blocker_text);

      dbms_output.put_line('Blocked info. : '||

         cv_rec.blocked_text);

      dbms_output.put_line('Blocked SQL   : '||

         cv_rec.blocked_sql_text);

   END LOOP;

   dbms_output.new_line;

   dbms_output.put_line('Found '||TO_CHAR(v_num_sessions)||

      ' blocked session(s).');

END;

/

exit;

Capture the output of the above to investigate further on the source of the locks on the BRM database.

Collect it from Infra:

            Syslogs for the node which is down.

Collect it from DBA:

ASH (Active session history) report

AWR report

Database alert logsCapture the output of glance

Glance output is equivalent to “lsof” for Unix. Capture the glance output for all the pages.

The sample looks as below. Need to capture the Memory utilization and CPU usage in both the pages. 

2. Verify SAM output from Unix OS

SAM output is available only to “root” user.

The purpose is to identify the below values for the OS.

HPUX specifics: (the UNIX system administrator will know the best settings for the environment):

maxfiles

Set soft limit for the number of files a process is allowed to have open simultaneously.

Acceptable Values:

Minimum: 30

Maximum: 60000

Default: 60

Description: maxfiles specifies the system default soft limit for the number of files a process is allowed to have open at any given time. It is possible for a process to increase its soft limit and therefore open more than maxfiles files.

maxfiles_lim

Set hard limit for number of files a process is allowed to have open simultaneously.

Acceptable Values:

Minimum: 30

Maximum: 60000

Default: 1024

Description: maxfiles_lim specifies the system default hard limit for the number of open files a process may have. It is possible for a non-superuser process to increase its soft limit (maxfiles) up to this hard limit.

nfile

Set maximum number of files that can be open simultaneously on the system at any given time.

Acceptable Values:

Minimum: 14

Maximum: Memory limited

Default: ((16*(Nproc+16+MaxUsers)/10)+32+2*(Npty+Nstrpty)

Description: nfile defines the maximum number files that can be open at any one time, system-wide. It is the number of slots in the file descriptor table. Be generous with this number because the required memory is minimal, and not having enough slots restricts system processing capacity.

nflocks

Specify the maximum combined total number of file locks that are available system-wide to all processes at any given time.

Acceptable Values:

Minimum: 2

Maximum: Memory limited

Default: 200

Description: nflocks gives the maximum number of file/record locks that are available system-wide. When choosing this number, note that one file may have several locks and databases that use lockf() may need an exceptionally large number of locks. Open and locked files consume memory and other system resources. These resources must be balanced against other system needs to maintain optimum overall system performance. Achieving an optimum balance can be quite complex, especially on large systems, because of wide variation in the kinds of applications being used on each system and the number and types of applications that might be running simultaneously, the number of local and/or remote users on the system, and many other factors.

nproc

nproc - number of processes

Acceptable Values:

Minimum: 10

Maximum: Memory limited

Default: 20+(8 * maxusers)

Description: nproc specifies the maximum total number of processes that can exist

simultaneously in the system. There are at least four system overhead processes at all times, and one entry is always reserved for the super-user. When the total number of processes in the system is larger than nproc, the system issues these messages:

At the system console:

proc: table is full

Also, if a user tries to start a new process from a shell, the following message prints on the user's terminal:

no more processes

If a user is executing fork() to create a new process, fork() returns -1 and sets errno to EAGAIN.

maxuprc

Set maximum number of simultaneous user processes.

Acceptable Values:

Minimum: 3

Maximum: Nproc-4

Default: 50

Description: maxuprc establishes the maximum number of simultaneous processes available to each user on the system. A user is identified by the user ID number, not by the number of login instances. Each user requires at least one process for the login shell, and additional processes for all other processes spawned in that process group. (the default is usually adequate). The super-user is exempt from this limit. Pipelines need at least one simultaneous process for each side of a |. Some commands, such as cc, fc, and pc, use more than one process per invocation. If a user attempts to start a new process that would cause the total number of processes for that user to exceed maxuprc, the system issues an error message to the user:

no more processes 

If a user process executes a fork() system call to create a new process, causing the total number of processes for the user to exceed maxuprc, fork() returns -1 and sets errno to EAGAIN.

The difference between the soft and the hard limit is that a process can change its own soft limit up to the hard limit. It is recommended that both the nfile and maxfiles system parameters be increased.

IMPORTANT: Kernel parameter changes will not take effect until the system is rebooted.


3. Verify the CM status using pstack

 The command "ps -ef | grep cm$" will provide the list of CM processes running. Use the command "pstack <cm_pid>" to get the current status of the CM. The output of this command will help to identify the point where the process got stuck.


4. Verify the status of the DM

Use the below command and capture the output of DM.

1. ps -ef | grep dm_oracle

2. Kill –USR1 <DM_pid>

 When the DM (dm_oracle) process unable to fetch information from the database due to locks or wait events on the database then this step will be useful to identify the root of the problem.


If a batch process such as pin_bill_accts or pin_inv_accts is hung, then use the pstack command to get the current status of the process and analyze further from there.



Comments

Popular posts from this blog

Oracle BRM: Some Important Utilities

Oracle BRM: Learn step by step by following youtube channel

Oracle BRM: Utilize the BRM's MTA framework to use a file instead of searching the database