DBA Sensation

September 29, 2010

root.sh failed on 2nd node when installing Grid Infrastructure

Filed under: [RAC] — Tags: , , , — zhefeng @ 12:39 pm

when i was running root.sh for the last step of grid infra installation on second node, it failed (it was success on 1st node):
root.sh failed on second node with following errors
——————————————————-
DiskGroup DATA1 creation failed with the following message:
ORA-15018: diskgroup cannot be created
ORA-15072: command requires at least 1 regular failure groups, discovered only 0

Oracle gives the reason: when you are using multipathing storage for ASM, you have to pre-configure the oracleasm file as below:

On all nodes,

1. Modify the /etc/sysconfig/oracleasm with:

ORACLEASM_SCANORDER=”dm”
ORACLEASM_SCANEXCLUDE=”sd”

2. restart the asmlib by (except 1st node):
# /etc/init.d/oracleasm restart

3. deconfigure the root.sh settings on nodes except 1st node:
$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force

4. Run root.sh again on the 2nd node (or other nodes)

Oracle Metalink Doc:
11GR2 GRID INFRASTRUCTURE INSTALLATION FAILS WHEN RUNNING ROOT.SH ON NODE 2 OF RAC [ID 1059847.1]

September 27, 2010

how to deinstall the failed 11gR2 grid infrastructure

Filed under: [RAC] — Tags: , — zhefeng @ 10:39 am

Two parts are involved: first deconfigure, then deinstall

Deconfigure and Reconfigure of Grid Infrastructure Cluster:

Identify cause of root.sh failure by reviewing logs in $GRID_HOME/cfgtoollogs/crsconfig and $GRID_HOME/log, once cause is identified, deconfigure and reconfigure with steps below – please keep in mind that you will need wait till each step finishes successfully before move to next one:

For Step1 and 2, you can skip node(s) on which you didn’t execute root.sh yet.

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all nodes, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.

Step 3: As root, run $GRID_HOME/root.sh on first node

Step 4: As root, run $GRID_HOME/root.sh on all other node(s), except last one.
Step 5: As root, run $GRID_HOME/root.sh on last node.

Deinstall of Grid Infrastructure Cluster:

Case 1: “root.sh” never ran on this cluster, then as grid user, execute $GRID_HOME/deinstall/deinstall

Case 2: “root.sh” already ran, then follow the step below – please keep in mind that you will need wait till each step finishes successfully before move to next one:

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all node, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.

Step 3: As grid user, run $GRID_HOME/deinstall/deinstall

September 7, 2010

Oracle 10g ASM/RAW storage migration

Filed under: [RAC] — Tags: , , , , , , — zhefeng @ 9:47 am

Objective:
we want to migrate the whole shared storage from old SAN to new SAN without re-installing the whole Oracle RAC

Scenario:
1.Current structure
[Nodes]
## eth1-Public
10.0.0.101 vmrac01 vmrac01.test.com
10.0.0.102 vmrac02 vmrac02.test.com
## eth0-Private
192.168.199.1 vmracprv01 vmracprv01.test.com
192.168.199.2 vmracprv02 vmracprv02.test.com
## VIP
10.0.0.103 vmracvip01 vmracvip01.test.com
10.0.0.104 vmracvip02 vmracvip02.test.com

[Storage]
Both ORACLE_HOME are local:
ORACLE_HOME=/database/oracle/10grac/db
CRS_HOME=/database/oracle/10grac/crs

Shared LUN display (3 partitions, 2*256M for OCR&VOTING, 1*20G for ASM)
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

OCR and Voting are on RAW device: /dev/sdb1 /dev/sdb2

ASM disks
bash-3.1$ export ORACLE_SID=+ASM1
bash-3.1$ asmcmd
ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 19971 17925 0 17925 0 DG1/

2. New storage (sdc 10G)
1). new LUN added
[root@vmrac01 bin]# fdisk -l

Disk /dev/sda: 26.8 GB, 26843545600 bytes
255 heads, 63 sectors/track, 3263 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 535 4192965 82 Linux swap / Solaris
/dev/sda3 536 3263 21912660 83 Linux

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

2). Partition the new LUN to 3 partitions
Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 1 32 257008+ 83 Linux
/dev/sdc2 33 64 257040 83 Linux
/dev/sdc3 65 1305 9968332+ 83 Linux

3). clone data from previous raw disks
**shutdown db and crs first to make sure there is no change for raw disks!
#dd if=/dev/raw/raw1 of=/dev/sdc1
514017+0 records in
514017+0 records out
263176704 bytes (263 MB) copied, 252.812 seconds, 1.0 MB/s

#dd if=/dev/raw/raw2 of=/dev/sdc2
514080+0 records in
514080+0 records out
263208960 bytes (263 MB) copied, 267.868 seconds, 983 kB/s

4).”cheating” the Oracle by re-binding to new device on both nodes
**old binding
Step1: add entries to /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdb2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: For the mapping to have immediate effect, run below command
#raw /dev/raw/raw1 /dev/sdb1
#raw /dev/raw/raw2 /dev/sdb2

Step3: Run the following commands and add them the /etc/rc.local file.
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdb1
#chown oracle:dba /dev/sdb2
#chmod 660 /dev/sdb1
#chmod 660 /dev/sdb2

**new binding on both node
Step1: editing /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdc1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdc2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: mapping immediately
#raw /dev/raw/raw1 /dev/sdc1
#raw /dev/raw/raw2 /dev/sdc2

Step3:permission and edit /etc/rc.local
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdc1
#chown oracle:dba /dev/sdc2
#chmod 660 /dev/sdc1
#chmod 660 /dev/sdc2

5). startup crs and oracle db, check the database, everything works fine after switching the raw disks!

3. ASM disk group migration
1). Mark the new disk sdc3 on one node
# /etc/init.d/oracleasm createdisk VOL2 /dev/sdc3
Marking disk “/dev/sdc3” as an ASM disk: [ OK ]

2). scan disk on the other node
[root@vanpgvmrac02 bin]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]

3). now verify the new disk was marked on both node
[root@vmrac01 disks]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

[root@vmrac02 bin]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

4). add new disk to DISKGROUP (under asm instance)
$export ORACLE_SID=+ASM1
$sqlplus / as sysdba
sql>alter diskgroup DG1 add disk VOL2
–wait rebalancing
sql>select * from v$asm_operation

5). remove old disk from DISKGROUP
sql>alter diskgroup DG1 drop disk VOL1
–wait until rebalancing finished
sql>select * from v$asm_operation
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR
———— ————— ———— ———- ———- ———-
EST_WORK EST_RATE EST_MINUTES
———- ———- ———–
1 REBAL RUN 1 1 2
1374 30 45

6). verify the database and asm, everything is ok!

7). clean-up the old disk confiruations
[root@vmrac01 bin]# /etc/init.d/oracleasm deletedisk VOL1
Removing ASM disk “VOL1”: [ OK ]
[root@vmrac01 bin]# /etc/init.d/oracleasm listdisks
VOL2

[root@vmrac02 ~]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]
[root@vmrac02 ~]# /etc/init.d/oracleasm listdisks
VOL2

8). wipe-off the partitions for sdb.

Reference:
1. Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. [ID 837308.1]
2. Previous doc “VMRAC installation” task 130.2008.09.12
3. OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [ID 428681.1]
4. ASM using ASMLib and Raw Devices
http://www.oracle-base.com/articles/10g/ASMUsingASMLibAndRawDevices.php

November 12, 2009

dbconsole can’t be started with ssl error

Filed under: [RAC] — Tags: , , , — zhefeng @ 2:36 pm

Got problem like failing to start dbconsole, check the trace file got this:
emdctl.trc
———–
2008-09-15 10:58:20 Thread-4136126688 ERROR http: 8: Unable to initialize ssl connection with
server, aborting connection attempt
2008-09-15 10:59:52 Thread-4136126688 ERROR ssl: nzos_Handshake failed, ret=29024.

After searching the metalink, found just need to unsecure and resecure the dbconsole to renew the expired dbconsole certificate:

1. Unsecure the Dbconsole
– Unsecure database control using
$ORACLE_HOME/bin>emctl unsecure dbconsole

2. Force an upload:

$ORACLE_HOME/bin> emctl upload

3. Also consider Resecuring the Dbconsole
– Secure database control using
$ORACLE_HOME/bin>emctl secure dbconsole

Starting with 10.2.0.4, HTTPS is used by default.

June 26, 2009

11g rac could not be started

Filed under: [RAC] — Tags: , , , , , — zhefeng @ 1:53 pm

Today after reboot the rac nodes servers, the RAC 11g couldn’t be started.
Here is the errors and solutions:

Errors:
1.[root@db03 racg]# crs_stat -t
CRS-0184: Cannot communicate with the CRS daemon.

[root@db03 racg]# crsctl check crs
Failure 1 contacting Cluster Synchronization Services daemon
Cannot communicate with Cluster Ready Services
Cannot communicate with Event Manager

[root@db03 racg]# ps -ef|grep -i init.d
root 3895 1 0 Jun21 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run
root 3896 1 0 Jun21 ? 00:00:00 /bin/sh /etc/init.d/init.cssd fatal
root 3897 1 0 Jun21 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 3961 3895 0 Jun21 ? 00:00:04 /bin/sh /etc/init.d/init.cssd startcheck
root 4031 3896 0 Jun21 ? 00:00:04 /bin/sh /etc/init.d/init.cssd startcheck
root 4123 3897 0 Jun21 ? 00:00:04 /bin/sh /etc/init.d/init.cssd startcheck
root 5230 24639 0 12:58 pts/0 00:00:00 grep -i init.d

–check the system log
[root@db03 racg]# tail -f /var/log/messages
Jun 26 13:15:49 db03 automount[3295]: create_udp_client: hostname lookup failed: Operation not permitted
Jun 26 13:15:49 db03 automount[3295]: create_tcp_client: hostname lookup failed: Operation not permitted
Jun 26 13:15:49 db03 automount[3295]: lookup_mount: exports lookup failed for d
Jun 26 13:15:49 db03 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4031.
Jun 26 13:15:49 db03 logger: Cluster Ready Services waiting on dependencies. Diagnostics in /tmp/crsctl.4031.

–check the trace file
[root@db03 racg]# cat /tmp/crsctl.4031
Oracle Cluster Registry initialization failed accessing Oracle Cluster Registry device: PROC-26: Error while accessing the physical storage Operating System error [Permission denied] [13]

–verify the raw file to see if they are binded
[root@db03 ~]# raw -qa
/dev/raw/raw1: bound to major 8, minor 1
/dev/raw/raw2: bound to major 8, minor 2

–check the permission since the log was mentioning that
[root@db03 ~]# cd /dev/raw
[root@db03 raw]# ls -al
total 0
drwxr-xr-x 2 root root 80 Jun 21 07:08 .
drwxr-xr-x 14 root root 3760 Jun 24 08:17 ..
crw——- 1 root root 162, 1 Jun 21 07:08 raw1
crw——- 1 root root 162, 2 Jun 21 07:08 raw2
–looks like the permission is not correct

–change permissions (on both nodes)
[root@db03 raw]# chown oracle:dba /dev/raw/raw1
[root@db03 raw]# chown oracle:dba /dev/raw/raw2
[root@db03 raw]# chmod 660 /dev/raw/raw1
[root@db03 raw]# chmod 660 /dev/raw/raw2
[root@db03 raw]# chown oracle:dba /dev/sda1
[root@db03 raw]# chown oracle:dba /dev/sda2
[root@db03 raw]# chmod 660 /dev/sda1
[root@db03 raw]# chmod 660 /dev/sda2

–after that, chheck the init.cssd, it’s up!
[root@db03 raw]# ps -ef|grep init.d
root 3895 1 0 Jun21 ? 00:00:00 /bin/sh /etc/init.d/init.evmd run
root 3896 1 0 Jun21 ? 00:00:03 /bin/sh /etc/init.d/init.cssd fatal
root 3897 1 0 Jun21 ? 00:00:00 /bin/sh /etc/init.d/init.crsd run
root 7588 3896 0 13:25 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oprocd
root 7606 3896 0 13:25 ? 00:00:00 /bin/sh /etc/init.d/init.cssd oclsomon
root 7630 3896 0 13:25 ? 00:00:00 /bin/sh /etc/init.d/init.cssd daemon
root 20251 6701 0 14:15 pts/0 00:00:00 grep init.d

–check the crs service is also working now
[root@db03 db]# crsctl check crs
Cluster Synchronization Services appears healthy
Cluster Ready Services appears healthy
Event Manager appears healthy

–bring up the rac resources again by using srvctl

Reference:
“why my oracle cluster could not start” http://surachartopun.com/2009/04/why-my-oracle-cluster-could-not-start.html

February 20, 2009

Detailed steps for removing a node from 10gR2 3-nodes RAC

Filed under: [RAC] — Tags: , , , , — zhefeng @ 11:36 am

Background:

we have to remove a node from Oracle 10gR2 RAC.

OS: Redhat EL4

Node name:

db01

db02

db05

Storage: ASM (instance name +ASM)

The most important 3 steps that need to be followed are;

A.         Remove the instance using DBCA.

B.         Remove the node from the cluster.

C.         Reconfigure the OS and remaining hardware.

Here is a breakdown of the above steps.

A. Remove the instance using DBCA.

————————————–

1. Verify that you have a good backup of the OCR (Oracle Configuration  Repository) using ocrconfig -showbackup or dd command.

Run the backup manually with dd for OCR and Voting disk:

[oracle@db01 ocr_voting]$ pwd

/databases/oracle/backup/ocr_voting

[oracle@db01 ocr_voting]$ dd if=/dev/raw/raw1 of=before_del_3rdnode.ocr

[oracle@db01 ocr_voting]$ dd if=/dev/raw/raw2 of=before_del_3rdnode.vote

2. Run DBCA from one of the nodes you are going to keep (db01).  Leave the database up and also leave the departing instance up and running.

3. Choose “Instance Management”

4. Choose “Delete an instance”

5. On the next screen, select the cluster database (p10c) from which you will delete an instance.  Supply the system privilege username and password.

6. On the next screen, a list of cluster database instances will appear.  Highlight the instance you would like to delete then click next.

7.         If you have services configured, reassign the services.  Modify the

services so that each service can run on one of the remaining

instances.  Set “not used” for each service regarding the instance

that is to be deleted.  Click Finish.

8.         If your database is in archive log mode you may encounter the

following errors (10gR2 doesn’t have this issue):

ORA-350

ORA-312

This may occur because the DBCA cannot drop the current log, as

it needs archiving.  This issue is fixed in the 10.1.0.3

patchset. But previous to this patchset you should click the

ignore button and when the DBCA completes, manually archive

the logs for the deleted instance and dropt the log group.

SQL>  alter system archive log all;

SQL>  alter database drop logfile group 2;

9.         Verify that the dropped instance’s redo thread has been removed by

querying v$log.  If for any reason the redo thread is not disabled

then disable the thread.

SQL> alter database disable thread 2;

10. Verify that the instance was removed from the OCR (Oracle Configuration Repository) with the following commands:

srvctl config database -d <db_name>

[oracle@vanpgprodb01 dbca]$ srvctl config database -d p10c

db01 p10c1 /databases/oracle/db

db02 p10c2 /databases/oracle/db

cd <CRS_HOME>/bin/crs_stat

11. If this node had an ASM instance and the node will no longer be a part of the cluster you will now need to remove the ASM instance with:

srvctl stop asm -n <nodename>

[oracle@db01 dbca]$ srvctl stop asm -n db05

srvctl remove asm -n <nodename>

[oracle@db01 dbca]$ srvctl remove asm -n db05

Verify that asm is removed with:

srvctl config asm -n <nodename>

[oracle@db01 dbca]$ srvctl config asm -n db05  –the output is nothing instead of show you the ASM instance name and asm home

Delete the ASM folders on deleting node:

rm -r $ORACLE_BASE/admin/+ASM
[root@db05 admin]# rm -rf +ASM

rm -f $ORACLE_HOME/dbs/*ASM*

[root@db05 dbs]# rm -rf *ASM*

Remove the ASM library on node3:

[root@db05 db]# /etc/init.d/oracleasm stop

Unmounting ASMlib driver filesystem: [  OK  ]

Unloading module “oracleasm”: [  OK  ]

[root@db05 db]# rpm -qa | grep oracleasm

oracleasmlib-2.0.2-1

oracleasm-2.6.9-55.ELsmp-2.0.3-1

oracleasm-support-2.0.3-1

[root@db05 db]# rpm -ev oracleasm-support-2.0.3-1 oracleasm-2.6.9-55.ELsmp-2.0.3-1 oracleasmlib-2.0.2-1

warning: /etc/sysconfig/oracleasm saved as /etc/sysconfig/oracleasm.rpmsave

[root@db05 db]# rm -f /etc/sysconfig/oracleasm.rpmsave

[root@db05 db]# rm -f /etc/rc.d/init.d/oracleasm

[root@db05 db]# rm -f /etc/rc0.d/*oracleasm*

[root@db05 db]# rm -f /etc/rc1.d/*oracleasm*

[root@db05 db]# rm -f /etc/rc2.d/*oracleasm*

[root@db05 db]# rm -f /etc/rc3.d/*oracleasm*

[root@db05 db]# rm -f /etc/rc4.d/*oracleasm*

[root@db05 db]# rm -f /etc/rc5.d/*oracleasm*

[root@db05 db]# rm -f /etc/rc6.d/*oracleasm*

B.         Remove the Node from the Cluster

—————————————-

Once the instance has been deleted, the process of removing the node from the cluster is a manual process. This is accomplished by running scripts on the deleted node to remove the CRS install, as well as scripts on the remaining nodes to update the node list.  The following steps assume that the node to be removed is still functioning.

1. Remove the listener and nodeapp on deleting node:

1). To delete node first stop and remove the nodeapps on the node you are removing.  Assuming that you have removed the ASM instance as the root user on a remaining node;

# srvctl stop nodeapps -n <nodename>

[oracle@db01 dbs]$ srvctl stop nodeapps -n db05

[oracle@db01 dbs]$ crs_stat -t

Name           Type           Target    State     Host

————————————————————

ora.p10c.db    application    ONLINE    ONLINE    db02

ora….c1.inst application    ONLINE    ONLINE    db01

ora….c2.inst application    ONLINE    ONLINE    db02

ora….SM1.asm application    ONLINE    ONLINE   db01

ora….01.lsnr application    ONLINE    ONLINE    db01

ora….b01.gsd application    ONLINE    ONLINE    db01

ora….b01.ons application    ONLINE    ONLINE    db01

ora….b01.vip application    ONLINE    ONLINE    db01

ora….SM2.asm application    ONLINE    ONLINE    db02

ora….02.lsnr application    ONLINE    ONLINE    db02

ora….b02.gsd application    ONLINE    ONLINE    db02

ora….b02.ons application    ONLINE    ONLINE    db02
ora….b02.vip application    ONLINE    ONLINE    db02

ora….05.lsnr application    OFFLINE   OFFLINE

ora….b05.gsd application    OFFLINE   OFFLINE

ora….b05.ons application    OFFLINE   OFFLINE

ora….b05.vip application    OFFLINE   OFFLINE

2). Run netca.  Choose “Cluster Configuration”.

3). Only select the node you are removing and click next.

4). Choose “Listener Configuration” and click next.

5). To delete the listener: Choose “Delete” and delete any listeners configured on the node you are removing.

6).  Run <CRS_HOME>/bin/crs_stat.  Make sure that all database resources are running on nodes that are going to be kept.  For example:

NAME=ora.<db_name>.db

TYPE=application

TARGET=ONLINE

STATE=ONLINE on <node2>

[oracle@db05 db]$ crs_stat -t

Name           Type           Target    State     Host

————————————————————

ora.p10c.db    application    ONLINE    ONLINE    db02

ora….c1.inst application    ONLINE    ONLINE    db01

ora….c2.inst application    ONLINE    ONLINE    db02

ora….SM1.asm application    ONLINE    ONLINE    db01

ora….01.lsnr application    ONLINE    ONLINE   db01

ora….b01.gsd application    ONLINE    ONLINE   db01

ora….b01.ons application    ONLINE    ONLINE   db01

ora….b01.vip application    ONLINE    ONLINE    db01

ora….SM2.asm application    ONLINE    ONLINE   db02

ora….02.lsnr application    ONLINE    ONLINE    db02

ora….b02.gsd application    ONLINE    ONLINE    db02

ora….b02.ons application    ONLINE    ONLINE    db02

ora….b02.vip application    ONLINE    ONLINE    db02

ora….b05.gsd application    OFFLINE   OFFLINE

ora….b05.ons application    OFFLINE   OFFLINE

ora….b05.vip application    OFFLINE   OFFLINE

Ensure that this resource is not running on a node that will be removed (this step hasn’t been done in our case since crs stats shows everything correct).  Use <CRS_HOME>/bin/crs_relocate to perform this.

Example:  crs_relocate ora.<db_name>.db

7).  As the root user, remove the nodeapps on the node you are removing.

# srvctl remove nodeapps -n <nodename>

[root@db05 db]# srvctl remove nodeapps -n db05

Please confirm that you intend to remove the node-level applications on node db05 (y/[n]) y

[root@db05 db]# crs_stat -t

Name           Type           Target    State     Host

————————————————————

ora.p10c.db    application    ONLINE    ONLINE    db02

ora….c1.inst application    ONLINE    ONLINE    db01

ora….c2.inst application    ONLINE    ONLINE    db02

ora….SM1.asm application    ONLINE    ONLINE    db01

ora….01.lsnr application    ONLINE    ONLINE    db01

ora….b01.gsd application    ONLINE    ONLINE    db01

ora….b01.ons application    ONLINE    ONLINE    db01

ora….b01.vip application    ONLINE    ONLINE    db01

ora….SM2.asm application    ONLINE    ONLINE    db02

ora….02.lsnr application    ONLINE    ONLINE    db02

ora….b02.gsd application    ONLINE    ONLINE    db02

ora….b02.ons application    ONLINE    ONLINE    db02

ora….b02.vip application    ONLINE    ONLINE    db02

2. Remove the Oracle Database Software from the Node to be Deleted

1). On node3 make sure you have correct ORACLE_HOME

[oracle@db05 db]$ echo $ORACLE_HOME

/databases/oracle/db

2). Update Node List for Oracle Database Software – (Remove node3):

[oracle@db05 bin]$ export DISPLAY=10.50.133.143:0

[oracle@db05 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME CLUSTER_NODES=”” -local

Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /databases/oracle/oraInventory

‘UpdateNodeList’ was successful.

Note: Although the OUI does not launch an installer GUI, the DISPLAY  environment variable still needs to be set!

3). De-install Oracle Database Software

Next, run the OUI from the node to be deleted (linux3) to de-install the Oracle Database software. Make certain that you choose the home to be removed and not just the products under that home.

4). Update Node List for Remaining Nodes in the Cluster (on any of remaining nodes)

[oracle@db01 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORACLE_HOME “CLUSTER_NODES={vanpgprodb01,vanpgprodb02}”

Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /databases/oracle/oraInventory

‘UpdateNodeList’ was successful.

3. Remove the Node to be Deleted from Oracle Clusterware

1). Remove Node-Specific Interface Configuration

[oracle@db05 db]$ export ORA_CRS_HOME=/databases/oracle/crs

[oracle@db05 db]$ grep ‘^remoteport’ $ORA_CRS_HOME/opmn/conf/ons.config

remoteport=6200

[oracle@db05 bin]$ ./racgons remove_config vanpgprodb05:6200

racgons: Existing key value on vanpgprodb05 = 4948.

WARNING: db05:6200 does not exist.

[oracle@db05 bin]$ ./racgons remove_config db05:4948

racgons: Existing key value on vanpgprodb05 = 4948.

racgons: db05:4948 removed from OCR.

[oracle@db05 bin]$ ./oifcfg delif -node db05

PROC-4: The cluster registry key to be operated on does not exist.

PRIF-11: cluster registry error

Note: this error has been approved ok.

4. Disable Oracle Clusterware Applications

Running this script will stop the CRS stack and delete the ocr.loc  file on the node to be removed. The nosharedvar option assumes the ocr.loc file is not on a shared file sytem.

While logged into node3 as the root user account, run the following:

[root@vanpgprodb05 install]# ./rootdelete.sh local nosharedvar nosharedhome

CRS-0210: Could not find resource ‘ora.db05.LISTENER_DB05.lsnr’.

CRS-0210: Could not find resource ‘ora.db05.ons’.

CRS-0210: Could not find resource ‘ora.db05.vip’.

CRS-0210: Could not find resource ‘ora.db05.gsd’.

Shutting down Oracle Cluster Ready Services (CRS):

Feb 17 15:24:19.153 | INF | daemon shutting down

Stopping resources. This could take several minutes.

Successfully stopped CRS resources.

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

Shutdown has begun. The daemons should exit soon.

Checking to see if Oracle CRS stack is down…

Checking to see if Oracle CRS stack is down…

Oracle CRS stack is not running.

Oracle CRS stack is down now.

Removing script for Oracle Cluster Ready services

Updating ocr file for downgrade

Cleaning up SCR settings in ‘/etc/oracle/scls_scr’

5. Delete Node from Cluster and Update OCR

Upon successful completion of the rootdelete.sh script, run the rootdeletenode.sh script to delete the node (linux3) from the Oracle cluster and to update the Oracle Cluster Registry (OCR). This script should be run from a pre-existing / available node in the cluster (node1) as the root user account:

Before executing rootdeletenode.sh, we need to know the node number associated with the node name to be deleted from the cluster. To determine the node number, run the following command as the oracle user account from node1:

[oracle@db01 bin]$ pwd

/databases/oracle/crs/bin

[oracle@db01 bin]$ olsnodes -n

db01    1

db02    2

db05    3

Note: notice the node # from result, we need to use it for removing node.

[root@db01 install]# pwd

/databases/oracle/crs/install

[root@db01 install]# ./rootdeletenode.sh db05,3

CRS-0210: Could not find resource ‘ora.db05.LISTENER_DB05.lsnr’.

CRS-0210: Could not find resource ‘ora.db05.ons’.

CRS-0210: Could not find resource ‘ora.db05.vip’.

CRS-0210: Could not find resource ‘ora.db05.gsd’.

CRS-0210: Could not find resource ora.db05.vip.

CRS nodeapps are deleted successfully

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Successfully deleted 14 values from OCR.

Key SYSTEM.css.interfaces.nodevanpgprodb05 marked for deletion is not there. Ignoring.

Successfully deleted 5 keys from OCR.

Node deletion operation successful.

‘db05,3’ deleted successfully

[root@db01 install]# ../bin/olsnodes -n

db01    1

db02    2

6. Update Node List for Oracle Clusterware Software – (Remove node3)

From Node3 as Oracle user:

[oracle@db05 bin]$ pwd

/databases/oracle/crs/oui/bin

[oracle@db05 bin]$ export ORA_CRS_HOME=/databases/oracle/crs

[oracle@db05 bin]$ export DISPLAY=10.50.133.143:0

[oracle@db05 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORA_CRS_HOME CLUSTER_NODES=”” -local CRS=true

Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /databases/oracle/oraInventory

‘UpdateNodeList’ was successful.

7. De-install Oracle Clusterware Software

[oracle@db05 bin]$ pwd

/databases/oracle/crs/oui/bin

[oracle@db05 bin]$ ./runInstaller

After delete the crs software, the directory will be deleted as well.

8. Update Node List for Remaining Nodes in the Cluster

[oracle@db01 bin]$ export ORA_CRS_HOME=/databases/oracle/crs

[oracle@db01 bin]$ export DISPLAY=10.50.133.143:0

[oracle@db01 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=$ORA_CRS_HOME “CLUSTER_NODES={db01,db02}” CRS=true

Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed.

The inventory pointer is located at /etc/oraInst.loc

The inventory is located at /databases/oracle/oraInventory

‘UpdateNodeList’ was successful.

C.         Reconfigure the OS and remaining hardware.

————————————————-

1. Check the tnsnames.ora on the rest of nodes if exists.

2. Delete oracle_home and crs_home

3. Next, as root, from the deleted node, verify that all init scripts and soft links are removed:

For Linux:

rm -f /etc/init.d/init.cssd

rm -f /etc/init.d/init.crs

rm -f /etc/init.d/init.crsd

rm -f /etc/init.d/init.evmd

rm -f /etc/rc2.d/K96init.crs

rm -f /etc/rc2.d/S96init.crs

rm -f /etc/rc3.d/K96init.crs

rm -f /etc/rc3.d/S96init.crs

rm -f /etc/rc5.d/K96init.crs

rm -f /etc/rc5.d/S96init.crs

rm -Rf /etc/oracle

rm -rf /etc/ora.tab

Reference:

1. Removing a Node from an Oracle RAC 10g Release 2 Cluster on Linux – (CentOS 4.5 / iSCSI)

by Jeff Hunter http://www.idevelopment.info/data/Oracle/DBA_tips/Oracle10gRAC/CLUSTER_23.shtml

2. How to delete a node from 3 node RAC in 10GR2

http://www.oraclefaq.net/2007/06/21/how-to-delete-a-node-from-3-node-rac-in-10gr2/

3. 10 Adding and Deleting Nodes and Instances on UNIX-Based Systems

Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide 10g Release 2 (10.2) Part Number B14197-09

http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/adddelunix.htm#BEIJAJHH

4. Removing a Node from a 10g RAC Cluster  Doc ID: 269320.1

https://metalink2.oracle.com/metalink/plsql/f?p=130:14:4196773229713543167::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,269320.1,1,1,1,helvetica

February 2, 2009

Change IP address for oracle RAC public and VIP interfaces

Filed under: [RAC] — Tags: — zhefeng @ 4:33 pm

My company is doing massive re-ip project. So our Oracle RAC ip address has to be changed as well. Fortunately, we don’t need to change hostname, otherwise the story will be more complicated.

1. Current Environment
1). Machine IP:
Node1: vmrac01
Node2: vmrac02

## eth1-Public
10.50.96.101 vmrac01 vmrac01.test.com
10.50.96.102 vmrac02 vmrac02.test.com
## eth0-Private
192.168.199.1 vmracprv01 vmracprv01.test.com
192.168.199.2 vmracprv02 vmracprv02.test.com
## VIP
10.50.96.103 vmracvip01 vmracvip01.test.com
10.50.96.104 vmracvip02 vmracvip02.test.com

2). cluster information:
cluster name — vm10cls
database name — v10c
Instance 1 — v10c1
Instance 2 — v10c2
Node1 — vmrac01
Node2 — vmrac02

2. New IP changing map(different subnet mask too):
10.50.96.101/255.255.255.0 vmrac01 –> 10.50.99.41/255.255.252.0
10.50.96.102/255.255.255.0 vmrac02 –> 10.50.99.42/255.255.252.0
10.50.96.103/255.255.255.0 vmracvip01 –> 10.50.99.43/255.255.252.0
10.50.96.104/255.255.255.0 vmracvip02 –> 10.50.99.44/255.255.252.0

3. steps 1 — change RAC IP settings
1). bring service down, make sure everything was offline except css daemon
bash-3.1$ srvctl stop database -d v10c
bash-3.1$ srvctl stop nodeapps -n vmrac01
bash-3.1$ srvctl stop nodeapps -n vmrac02
bash-3.1$ crs_stat -t
Name Type Target State Host
————————————————————
ora.v10c.db application OFFLINE OFFLINE
ora….c1.inst application OFFLINE OFFLINE
ora….c2.inst application OFFLINE OFFLINE
ora….SM1.asm application OFFLINE OFFLINE
ora….01.lsnr application OFFLINE OFFLINE
ora….c01.gsd application OFFLINE OFFLINE
ora….c01.ons application OFFLINE OFFLINE
ora….c01.vip application OFFLINE OFFLINE
ora….SM2.asm application OFFLINE OFFLINE
ora….02.lsnr application OFFLINE OFFLINE
ora….c02.gsd application OFFLINE OFFLINE
ora….c02.ons application OFFLINE OFFLINE
ora….c02.vip application OFFLINE OFFLINE

2). backup OCR and Voting disks
bash-3.1$ ocrcheck|grep -i file
Device/File Name : /dev/raw/raw1
bash-3.1$ crsctl query css votedisk
0. 0 /dev/raw/raw2
located 1 votedisk(s).

#dd if=/dev/raw/raw1 of=/database/temp/ocr_vote_bk/ocr.bak
#dd if=/dev/raw/raw2 of=/database/temp/ocr_vote_bk/vote.bak

3). get current config:
bash-3.1$ oifcfg getif
eth0 10.50.96.0 global public –current network for public
eth1 192.168.199.0 global cluster_interconnect –we are not going to change this

4). delete current public ip:
bash-3.1$ oifcfg delif -global eth0

5). change to new network:
bash-3.1$ oifcfg setif -global eth0/10.50.99.0:public

6). change vip address:
a. check current one
bash-3.1$ srvctl config nodeapps -n vmrac01 -a
VIP exists.: /vmracvip01/10.50.96.103/255.255.255.0/eth0
bash-3.1$ srvctl config nodeapps -n vmrac02 -a
VIP exists.: /vmracvip02/10.50.96.104/255.255.255.0/eth0
b. Modify VIP component (has to be the css owner, “root” usually)
#srvctl modify nodeapps -n vmrac01 -A 10.50.99.43/255.255.252.0/eth0
#srvctl modify nodeapps -n vmrac02 -A 10.50.99.44/255.255.252.0/eth0
c. double verify the changes
bash-3.1$ srvctl config nodeapps -n vmrac01 -a
VIP exists.: /vmracvip01/10.50.99.43/255.255.252.0/eth0
bash-3.1$ srvctl config nodeapps -n vmrac02 -a
VIP exists.: /vmracvip02/10.50.99.44/255.255.252.0/eth0

7). change the hosts file(on both nodes):
## eth1-Public
10.50.99.41 vmrac01 vmrac01.test.com
10.50.99.42 vmrac02 vmrac02.test.com
## eth0-Private
192.168.199.1 vmracprv01 vmracprv01.test.com
192.168.199.2 vmracprv02 vmracprv02.test.com
## VIP
10.50.99.43 vmracvip01 vmracvip01.test.com
10.50.99.44 vmracvip02 vmracvip02.test.com

8). if the listener is using any IP address, it also needs to be changed.

4. Steps 2 — change OS IP settings
1). change IP
[root@vmrac01]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
IPADDR=10.50.99.41
NETMASK=255.255.252.0
HWADDR=00:50:56:BD:05:14
ONBOOT=yes

[root@vmrac02]# vi /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BOOTPROTO=static
IPADDR=10.50.99.42
NETMASK=255.255.252.0
HWADDR=00:50:56:BD:3E:08
ONBOOT=yes

2). change the default gateway on both nodes (if needed, here since they are in same vlan so i didn’t change them)
[root@vmrac01 ~]# cat /etc/sysconfig/network
NETWORKING_IPV6=yes
HOSTNAME=vmrac01
NETWORKING=yes
NISDOMAIN=nis
GATEWAY=10.50.96.1 <– here is the default gateway to be changed

3) Change the IP Address’es on the known_hosts ssh config files for oracle user
$ su – oracle
$ cd .ssh
$ cp known_hosts known_hosts.bak
$ modify the old IP’s to the new IP’s

4). restart network (on both node)
#service network restart

5). restart crs daemon (on both node)
#crsctl stop crs
#crsctl start crs

5. Step3 — verify everything

reference:
1. "How to Change Interconnect/Public Interface IP or Subnet in Oracle Clusterware", Doc ID: 283684.1
2. "Modifying the VIP or VIP Hostname of a 10g or 11g Oracle Clusterware Node", DOC ID: 276434.1
3. "How to change Public and VIP component address in case of RAC?" http://orcl-experts.info/index.php?name=FAQ&id_cat=9

December 8, 2008

Oracle RAC-stop crs autostart on one node

Filed under: [RAC] — zhefeng @ 12:24 pm

***Background:
One of the RAC node vanpgprodb05 always has some system level issue and every time when we reboot the machine the crs on that machine gets restart automatically. We don’t want that happen during the time when we are fixing it. So temporarily disable the crs autostart on that node is necessary.

***Related document on metalink:
Oracle notes: Note:298073.1

The CRS profiles for these resources must be modified for the new CRS behavior
to take effect for all RAC databases installed on the cluster. If, however,
the affect of the change is to be limited to a subset of installed databases,
the list of resources needs to be filtered further. (The rest of this section
should be skipped if the new CRS behavior is to be in effect for all databases
installed on the cluster.)
Please note that since more than one database may be installed on a cluster,
to modify the level of protection for a particular database, one must identify
the resources that represent entities of this database. This may be easily
accomplished since the names of the resources belonging to the above- stated
types always start with ora.. For instance, ora.linux.db
means that the resource belongs to the database named linux. Only resources of
the above-enumerated types that belong to the selected databases will need to
have their profiles modified.
MODIFYING RESOURCE PROFILES
Please note that Oracle strongly discourages any modifications made to CRS
profiles for any resources starting with . Never make any modifications to
CRS profiles for resources but the ones explicitly described below in
this document.

To modify a profile attribute for a resource, the following steps must be
followed:
1.Generate the resource profile file by issuing the following command:
crs_stat -p resource_name > $CRS_HOME/crs/public/resource_name.cap

2.Update desired attributes by editing the file created in step 1.

3.Commit the updates made as a part of the previous step by issuing the
following command
crs_register -u resource_name

4. Verify the updates have been committed by issuing the following command
crs_stat -p resource_name

For each of the resources identified as a part of the preceding section, the
following modifications must be made:
1.Resources of type inst must have the following attributes modified
AUTO_START must be set to 2
RESTART_ATTEMPTS must be set to 0 or 1. The former value will prevent
CRS from attempting to restart a failed instance at all while the latter
will grant it a single attempt; if this only attempt is unsuccessful,
CRS will leave the instance as is.
2.Resources of type db, srv, cs must have the following attributes modified
AUTO_START must be set to 2

***Practice steps:
1. Objective: to disable the crs autostart on node vanpgprodb05
2. catch all the resources profile on vanpgprodb05
1). from vanpgprodb01 (at that time the vanpgrpodb05 is down), run these catch commands
$crs_stat -p ora.p10c.p10c5.inst>$CRS_HOME/crs/public/ora.p10c.p10c5.inst.cap
$crs_stat -p ora.vanpgprodb05.ASM3.asm>$CRS_HOME/crs/public/ora.vanpgprodb05.ASM3.asm.cap
$crs_stat -p ora.vanpgprodb05.gsd>$CRS_HOME/crs/public/ora.vanpgprodb05.gsd.cap
$crs_stat -p ora.vanpgprodb05.LISTENER_VANPGPRODB05.lsnr>$CRS_HOME/crs/public/ora.vanpgprodb05.LISTENER_VANPGPRODB05.lsnr.cap
$crs_stat -p ora.vanpgprodb05.ons>$CRS_HOME/crs/public/ora.vanpgprodb05.ons.cap
$crs_stat -p ora.vanpgprodb05.vip>$CRS_HOME/crs/public/ora.vanpgprodb05.vip.cap

2). editing these .cap files, change parameter autostart=2 (not autostart).

3). crs_register -u resource_name
$crs_register -u ora.p10c.p10c5.inst
$crs_register -u ora.vanpgprodb05.ASM3.asm
$crs_register -u ora.vanpgprodb05.gsd
$crs_register -u ora.vanpgprodb05.LISTENER_VANPGPRODB05.lsnr

4). use crs_stat -p resource_name to verify the parameter has been changed
crs_stat -p|grep AUTO_START=2 -B 7

5). reboot the node vanpgprodb05 now, after reboot
[oracle@vanpgprodb01 public]$ crs_stat -t
Name Type Target State Host
————————————————————
ora.p10c.db application ONLINE ONLINE vanp…db02
ora….c1.inst application ONLINE ONLINE vanp…db01
ora….c2.inst application ONLINE ONLINE vanp…db02
ora….c5.inst application OFFLINE OFFLINE
ora….SM1.asm application ONLINE ONLINE vanp…db01
ora….01.lsnr application ONLINE ONLINE vanp…db01
ora….b01.gsd application ONLINE ONLINE vanp…db01
ora….b01.ons application ONLINE ONLINE vanp…db01
ora….b01.vip application ONLINE ONLINE vanp…db01
ora….SM2.asm application ONLINE ONLINE vanp…db02
ora….02.lsnr application ONLINE ONLINE vanp…db02
ora….b02.gsd application ONLINE ONLINE vanp…db02
ora….b02.ons application ONLINE ONLINE vanp…db02
ora….b02.vip application ONLINE ONLINE vanp…db02
ora….SM3.asm application OFFLINE OFFLINE
ora….05.lsnr application OFFLINE OFFLINE
ora….b05.gsd application OFFLINE OFFLINE
ora….b05.ons application ONLINE ONLINE vanp…db05
ora….b05.vip application ONLINE ONLINE vanp…db05

Note: for ONS, the autostart is “AUTO_START=always”, and you can’t take vip off

6). the disabling works! don’t forget the change the autostart back when problem gets resolved. repeat steps 2)-4)

Adds-on: if you want to stop the autostart for css daemon either, then do this:
/etc/init.d/init.cssd stop

If you want to stop it from autostarting after bootup;
[root@vanpgprodb06 rc3.d]# /etc/init.d/init.crs disable
Automatic startup disabled for system boot.

Blog at WordPress.com.