Oracle Database Extended RAC

Oracle 10g Extended RAC on

Windows 2003

Handbook

 For Achieving HA & Disaster Recovery Solution

By


Database Manager




Index

Introduction                                                                                 

Test Bed overview                                                                     

Installation of Virtual Machines                                         

Installation of Oracle Clusterware Services                  

Installation of Oracle Software/ASM/DB                           

Post RAC Installation Health Check                                    

Applying Oracle 10.2.0.3 Patch set                                    

Adding a third Node to the Cluster Database                  

RAC Concepts Primer                                                                 

Troubleshooting RAC Environment                                          

Backup and Recovery – RAC Environment                              

References   



Introduction

The document is intended as guidelines for Oracle and System Administrators who are responsible for implementing an Oracle Extended (also known as stretched clustering) RAC for the nodes that are located within 5KM away from each other. I have implemented the same on IBM P590 series running AIX 5.3 with Oracle 10g Release 10.2.0.3, using Oracle Clusterware on SAN storage, the two data centers connected over a 1GB Laser Link network with latency of less than 5 ms. We have had some problems in implementing the solution and most of the issues were related to configuring Oracle RAC components related to network synchronization.

At that time, I decided to first fully test the Extended RAC implementation on my own and thanks to VMWare software, I was able to fully test the implementation. This document is based on the VMWare installation on windows and I would highly recommend that before you actually go on implementing Extended RAC over Unix in UAT/Production environment, have it fully tested on windows to clear the concepts behind it.

 

For RAC Handbook on IBM P590 Series-AIX OS, Pease refer to RAC resources section on my site.



Test Bed overview

Oracle 10g Release 10.2.0.1 (later we will apply latest patch 10.2.0.3)

 

Windows 2003 Enterprise Operating System Service Pack 1

 

Windows resource kit to be installed on every virtual machine.

 

VMWare Workstation Version 4.5 (You can download trial version from their site, but I would highly recommend buying it).

 

4 windows XP Workstation PCs attached to a Network of 5 ms latency at most, with 1GB Ram, 1 CPU each, and at least 40GB of free storage space.

 

There are many articles on the internet which talks about implementing RAC on a single PC with VMWare installed. However they all need at least 2GB of Memory and even when you acquired that, after the RAC is installed, you can not test all possible RAC scenarios with lack of resources. What I did and recommend, is that at your work place, talk to other DBAs and say that the 4 DBAs will have their PCs which can be used to simulate the extended RAC testing. So all you need to make sure is 4 PCs with 1GB of memory and are connected over LAN with administrative privileges on their workstations.


Installation of Virtual Machines  

 

Lets call the 4 XP OS PC work stations as:

XPWS1, XPWS2, XPWS3, & XPWS4

In each of these work stations, install VMware Workstation software and then Launch VMware software and create one virtual machine of windows 2003 on every XP work station. While creating a virtual machine, please make sure to assign the following to each Virtual server:

·         On XPWS1, create folder C:\RACVIRTUAL and under it create two subfolders as RAC1 and ASMDISK.

·         On XPWS2, create folder D:\RACVIRTUAL and under it create two subfolders as RAC3 and ASMDISK.

·         On XPWS2, create folder D:\RACVIRTUAL and under it create 1 subfolder as RAC5.

·         The above folders are for the new Virtual Server hosting Windows 2003. ASM folder is for hosting ASM Raw devices.

·         Virtual OS should be assigned with 524MB of RAM each.

·         During creation of virtual servers, choose bridged network, IO Adapter as LSI Logic, disks as SCSI.

·         Under Virtual machine settings, remove Drive A:

·         As for windows 2003, choose default settings.

·         Make sure the swap space is 1Gb and goes to Drive C:

·         Create two logical drives C, D where C: drive should be of 4G for OS usage, and D: of 5GB where we will install Oracle 10g Software.

·         After installation is over, choose VM menu option and install VMWare tools for the new virtual machine/OS.

 

Workstation

Virtual Server

Remarks

XPWS1

RAC1

RAC Node1 with its storage defined

XPWS2

RAC3

RAC Node2 with its storage defined

XPWS3

RAC5

RAC Node3 with no storage of its own

XPWS4

RAC6

Only storage for 3rd voting disk

          

 

At this point you will have 4 XP work stations installed with single virtual machine each running Windows 2003 OS. Let us focus now on the  first two virtual machines created. Remaining two Virtual machines will be used later for a) adding a 3rd node, and adding a third voting disk/site respectively. Therefore, at this point you will work with two virtual machines which I named as RAC1 (on PC XPWS1), and RAC3 (on PC XPWS2). RAC5 (on XPWS3) will be configured later as a 3rd node.

 

You now need to configure network settings for these two virtual servers. Shutdown RAC1 and RAC3 servers (we will call them from now on RAC1 and RAC3 which are created on two separate workstations connected over your home or office network). This point is very important to note, because all of my configurations below are based on this fact, so please do not create two virtual machines on the same physical PC but have them created on different physical servers. If you would like to proceed with single PC RAC testing, then you should see other articles on the internet.

 

Lets now proceed with NIC settings:

·         Shutdown RAC1, edit Virtual machine settings and add a new Ethernet Adapter as of Bridged type. Bring up RAC1 server and you should see a window displayed for a new H.W found, press Next and complete it, it will fail at the end, but let it be, because if you pressed cancel, every time you reboot RAC1, the same new H.W found screen will appear again.

·         Go to network connections from Control panel and you should see two NIC identified as Local Area connection1 and Local  Area connection2. Rename the two as Public and Private respectively. Click on their properties and choose Internet Protocol and assign the IP addresses. Make sure the subnet is different for Public and Private, because Private NIC will be used for Cache fusion/Inter Nodes communication between the two RAC Nodes. Click on the Advanced Settings of the Network connections windows and make sure the order of list is first Public, and then Private, so that the host name RAC1 will resolve to Public NIC. My settings are:

 

RAC1 Node:

     Public  NIC:    10.10.142.139, subnet:255.0.0.0

                     Gateway:10.10.142.1

     Private NIC:    10.10.0.139, subnet:255.0.0.0

                     Gateway: leave empty

 

Repeat the same for RAC3 node.

RAC3 Node:

     Public  NIC:    10.10.42.72, subnet:255.0.0.0

                     Gateway:10.10.142.1

     Private NIC:    10.10.0.72, subnet:255.0.0.0

                     Gateway: leave empty

 

·         Modify the Hosts file of the RAC1 & RAC3  OS (c:\windows\system32\drivers\etc) as:

 

10.10.142.139   RAC1

10.10.0.139     RAC1-priv

10.10.42.250    RAC1-vip

 

10.10.42.72     RAC3

10.10.0.72      RAC3-priv

10.10.42.251    RAC3-vip

 

RAC1-VIP and RAC3-VIP are not physically linked to any Network Card but are logically defined on the Public Subnet address. When we have completed the RAC setup, these are the IP addresses (or names) which will be used to configure client connections to the RAC. The Setup would be in such a way that these will act as components of the cluster so in case Nod1 is down, its VIP service will fail over to the surviving node and client will be re-directed to the surviving node without any tcp time out, which usually happens when listener is listening on a port attached to physical IP address.

     Perform following steps for all nodes (RAC1, RAC3, RAC5, RAC6)

·         Oracle supports the TCP/IP protocol for the public and private networks and requires that Windows Media Sensing is disabled by setting the value of the DisableDHCPMediaSense parameter to 1. To do this, go to windows registry via Regedit.exe. Navigate to HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\Tcpip\Parameters, and add DWORD key: DisableDHCPMediaSense=1. To know more about this parameter see http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/regentry/94173.mspx?mfr=true

·         Go to command prompt and type:

set devmgr_show_nonpresent_devices =1

devmgmt.msc

remove greyed out NIC under Network category.

·         From control panel, add/remove Windows components, add Terminal Services component, this will enable you to use Remote Desktop services to access both nodes from a remote laptop or workstation if required.

 

Shutdown Both servers(RAC1, RAC3) from their respective Work stations, (you could use XPWS1 and from here open a remote connection to XPWS2,that’s what I did). Also create a mapped drive from each workstation to each other’s C:\RACVIRTUAL (and D:\RACVIRTUAL) folder with full permissions granted. The mapped drive can be called Z: mapped to C:\RACVIRTUAL. Now edit the main virtual Server file C:\RACVIRTUAL\RAC1\ winNetEnterprise.vmx on XPWS1 (for RAC1 node settings) and add the following lines:

 

disk.locking = "FALSE"

diskLib.dataCacheMaxSize = "0"

diskLib.dataCacheMaxReadAheadSize = "0"

diskLib.dataCacheMinReadAheadSize = "0"

diskLib.dataCachePageSize = "4096"

diskLib.maxUnsyncedWrites = "0"

 

scsi1.present = "TRUE"

scsi1.virtualDev = "lsilogic"

scsi1.sharedBus = "VIRTUAL"

·         Repeat the same for mapped drive Z:\RAC3\ winNetEnterprise.vmx for the Virtual Server RAC3 created on XPWS2.

 

We are now ready to create raw devices. Since we are going to have an Extended RAC setup, I will create a set of raw devices on the storage of Both Workstations (XPWS1, XPWS2) via VMware. Each set will have three raw devices, one to hold database files, second set for OCR and third for Voting disks. Later when configuring ASM with normal redundancy, I will mirror these two raw devices sets(created on two different PC’s storage). Later I will show you how to create additional raw disks to move different databases files (like redo, backup sets to their respective separate raw disks).

 

From RAC1 node: Go to command prompt.

Raw Device to hold OCR information

vmware-vdiskmanager.exe -c -s 300MB -a lsilogic -t 2 C:\RACVIRTUAL\ASMDISK\ocr1.vmdk

 

Raw Device to hold Voting disk information

vmware-vdiskmanager.exe -c -s 200MB -a lsilogic -t 2 C:\RACVIRTUAL\ASMDISK\votingdisk1.vmdk

 

Explanation of OCR and Voting disk will be explained in next section.

 

Raw Device to hold Oracle Database Files.

You need to use VMWare GUI interface to create this, as I had run out of SCSI limits for number of virtual devices, so I chose IDE hard disk as new virtual disk, make sure you pre-allocate the disk space and chose a size of 4GB. Create the raw disk as C:\RACVIRTUAL\ASMDISK\oradata1.vmdk

 

Repeat the same from RAC3 node (on XPWS2) as:

vmware-vdiskmanager.exe -c -s 200MB -a lsilogic -t 2 D:\RACVIRTUAL\ASMDISK\ocr2.vmdk

vmware-vdiskmanager.exe -c -s 200MB -a lsilogic -t 2 D:\RACVIRTUAL\ASMDISK\votingdisk2.vmdk

 

And now create D:\RACVIRTUAL\ASMDISK\oradata2.vmdk via GUI as IDE hard disk.

 

On XPWS2 I have used drive D: because I have more free space in Drive D: on that PC, however for RAC1 node on XPWS1 I have used Drive C:.

Also note the names of the raw devices end with digit2 because these raw devices will be used together with raw devices created on XPWS1 and mirrored by ASM.

 

Now you need to make the new disks available to the VMWare workstation software by editing the virtual settings as:

 

·         Bring up both nodes RAC1 and RAC2 and perform the following:

1.   Go to command prompt and type Diskpart and then enter command Automount enable. This is required to make sure the raw devices will be auto mounted every time os starts up.

2.   The two RAC nodes must have the time clock synchronized. You can download third part software which makes clock in synch among different servers in one network. Or you can use net time command to configure the time from any time server available on the internet. Search for Time Sync Server on windows in Google for that.

3.   What I did was basically sync the time for each of the Virtual server to the Host OS(which is the XP workstation) as: From RAC1 Node.

Check current Time Server with: NET TIME /QUERYSNTP

To set the initial time with Time server as:

NET TIME \\XPWS1 /SET

 

Set current Time Server(XPWS1) for a RAC1 as:

NET TIME /SETSNTP:XPWS1

Repeat the same for RAC3 node and make XPWS1 as its Time Server as well.

Then check times on both servers from one place as:

NET TIME \\RAC1

NET TIME \\RAC3

Alternatively right click on VMware tools icon and select time Sync between Host and Virtual machines, but when you opt for this option, make sure Windows Time server is disabled.

However you need to run the above command on every time machine start up. So make these two command as part of a schedule job to trigger on every system starts up.

·         For RAC1 Node, you already have registered the raw device that holds database files (oradata1.vmdk) as you have created it via VMWare GUI. However for the OCR and Voting disks, since you created them via command prompt, just use the VMWare gui –add Disks and this time instead of creating a new virtual disk, choose create from existing and browse to these two files location and select the vmdk files.

·         Start up the RAC1 node, right click on manager MyComputer short cut on desktop, choose manage and then select Storage section and click on Disk Management, you should see a popup window which will list the three new disks you added. Accept defaults and press Next, also accept defaults here and complete. You should now see your three disks appear as offline in the Disk Management tab. Click on each of them one by one and perform the following tasks:

1.   Right click on the new disk and select new partition and choose extended and proceed to finish.

2.    Right click again and choose Create Logical drive. Here make sure to choose “Do not assign drive letter’ and ‘do not format the disk’ and continue until completion. You should now see the disk as Online status. Repeat the same for remaining two disks (ocr and voting).

·         Shutdown both RAC1 and RAC3 nodes.

·         Since RAC1 node also need to access the same disks created on RAC3 node, you need to repeat the same procedure as above, but this time when you add disks from exiting, choose remote location as Z:\... and add the three raw devices from RAC3 node.

·         Repeat the whole Procedure as described above for RAC3.

·         Bring up both nodes and verify all storage settings.

 

 

For simplicity I have copied the contents of the vmx files for both nodes below:

 

RAC1 Node vmx file (Notice remote raw device links with Z:)

Location: C:\RACVIRTUAL\RAC1\winNetEnterprise.vmx

disk.locking = "FALSE"

diskLib.dataCacheMaxSize = "0"

diskLib.dataCacheMaxReadAheadSize = "0"

diskLib.dataCacheMinReadAheadSize = "0"

diskLib.dataCachePageSize = "4096"

diskLib.maxUnsyncedWrites = "0"

 

config.version = "7"

virtualHW.version = "3"

scsi0.present = "TRUE"

scsi0.virtualDev = "lsilogic"

memsize = "524"

scsi0:0.present = "TRUE"

scsi0:0.fileName = "Windows Server 2003 Enterprise Edition.vmdk"

ide1:0.present = "TRUE"

ide1:0.fileName = "auto detect"

ide1:0.deviceType = "cdrom-raw"

floppy0.fileName = "A:"

Ethernet0.present = "TRUE"

sound.present = "TRUE"

sound.fileName = "-1"

displayName = "RAC1"

guestOS = "winNetEnterprise"

priority.grabbed = "normal"

priority.ungrabbed = "normal"

powerType.powerOff = "default"

powerType.powerOn = "default"

powerType.suspend = "default"

powerType.reset = "default"

 

ide1:0.startConnected = "TRUE"

Ethernet0.addressType = "generated"

uuid.location = "56 4d c5 0d df c2 22 83-78 96 e8 44 92 2d 88 e3"

uuid.bios = "56 4d c5 0d df c2 22 83-78 96 e8 44 92 2d 88 e3"

ethernet0.generatedAddress = "00:0c:29:2d:88:e3"

ethernet0.generatedAddressOffset = "0"

tools.syncTime = "FALSE"

 

scsi0:1.present = "TRUE"

scsi0:1.fileName = "Windows Server 2003 Enterprise Edition (3).vmdk"

sound.virtualDev = "es1371"

 

scsi1.present = "TRUE"

scsi1.virtualDev = "lsilogic"

scsi1.sharedBus = "VIRTUAL"

 

scsi1:1.present = "FALSE"

scsi1:1.mode = "persistent"

scsi1:1.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.vmdk"

scsi1:1.deviceType = "plainDisk"

 

scsi1:2.present = "TRUE"

scsi1:2.mode = "persistent"

scsi1:2.fileName = "C:\RACVIRTUAL\ASMDISK\ocr1.vmdk"

scsi1:2.deviceType = "plainDisk"

 

scsi1:3.present = "TRUE"

scsi1:3.mode = "persistent"

scsi1:3.fileName = "C:\RACVIRTUAL\ASMDISK\votingdisk1.vmdk"

scsi1:3.deviceType = "plainDisk"

 

scsi1:4.present = "TRUE"

scsi1:4.mode = "persistent"

scsi1:4.fileName = "Z:\ASMDISK\ocr2.vmdk"

scsi1:4.deviceType = "plainDisk"

 

scsi1:0.present = "TRUE"

scsi1:0.mode = "persistent"

scsi1:0.fileName = "Z:\ASMDISK\votingdisk2.vmdk"

scsi1:0.deviceType = "plainDisk"

 

Ethernet1.present = "TRUE"

 

Ethernet1.addressType = "generated"

ethernet1.generatedAddress = "00:0c:29:2d:88:ed"

ethernet1.generatedAddressOffset = "10"

 

floppy0.present = "FALSE"

redoLogDir = "."

 

Ethernet0.connectionType = "bridged"

 

scsi0:2.present = "FALSE"

scsi0:2.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.vmdk"

 

scsi0:3.present = "FALSE"

scsi0:3.fileName = "C:\RACVIRTUAL\ASMDISK\test9.vmdk"

 

scsi0:6.present = "FALSE"

scsi0:6.fileName = "C:\RACVIRTUAL\ASMDISK\test.vmdk"

 

ide0:0.present = "TRUE"

ide0:0.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.vmdk"

 

ide0:0.deviceType = "plainDisk"

ide0:1.present = "TRUE"

ide0:1.fileName = "Z:\ASMDISK\oradata2.vmdk"

ide0:1.deviceType = "plainDisk"

 

ide1:1.present = "FALSE"

ide1:1.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.vmdk"

 

scsi0:2.deviceType = "plainDisk"

 

scsi0:5.present = "FALSE"

scsi0:5.fileName = "C:\RACVIRTUAL\ASMDISK\oradata1.vmdk"

scsi0:5.deviceType = "plainDisk"

RAC3 Node vmx file: Note remote links to raw devices on RAC1

Location: Z:\RAC3\winNetEnterprise.vmx

disk.locking = "FALSE"

diskLib.dataCacheMaxSize = "0"

diskLib.dataCacheMaxReadAheadSize = "0"

diskLib.dataCacheMinReadAheadSize = "0"

diskLib.dataCachePageSize = "4096"

diskLib.maxUnsyncedWrites = "0"

 

config.version = "7"

virtualHW.version = "3"

scsi0.present = "TRUE"

scsi0.virtualDev = "lsilogic"

memsize = "524"

scsi0:0.present = "TRUE"

scsi0:0.fileName = "Windows Server 2003 Enterprise Edition.vmdk"

ide1:0.present = "TRUE"

ide1:0.fileName = "auto detect"

ide1:0.deviceType = "cdrom-raw"

floppy0.fileName = "A:"

Ethernet0.present = "TRUE"

sound.present = "TRUE"

sound.fileName = "-1"

displayName = "RAC3"

guestOS = "winNetEnterprise"

priority.grabbed = "normal"

priority.ungrabbed = "normal"

powerType.powerOff = "default"

powerType.powerOn = "default"

powerType.suspend = "default"

powerType.reset = "default"

 

ide1:0.startConnected = "TRUE"

Ethernet0.addressType = "generated"

uuid.location = "56 4d a4 62 67 78 2e 4e-bd 76 ea 69 ed 2d f9 03"

uuid.bios = "56 4d a4 62 67 78 2e 4e-bd 76 ea 69 ed 2d f9 03"

ethernet0.generatedAddress = "00:0c:29:2d:f9:03"

ethernet0.generatedAddressOffset = "0"

tools.syncTime = "FALSE"

 

scsi0:1.present = "TRUE"

scsi0:1.fileName = "Windows Server 2003 Enterprise Edition (3).vmdk"

sound.virtualDev = "es1371"

 

 

scsi1.present = "TRUE"

scsi1.virtualDev = "lsilogic"

scsi1.sharedBus = "VIRTUAL"

 

scsi1:1.present = "FALSE"

scsi1:1.mode = "persistent"

scsi1:1.fileName = "Z:\ASMDISK\oradata1.vmdk"

scsi1:1.deviceType = "plainDisk"

 

scsi1:2.present = "TRUE"

scsi1:2.mode = "persistent"

scsi1:2.fileName = "z:\ASMDISK\ocr1.vmdk"

scsi1:2.deviceType = "plainDisk"

 

scsi1:3.present = "TRUE"

scsi1:3.mode = "persistent"

scsi1:3.fileName = "z:\ASMDISK\votingdisk1.vmdk"

scsi1:3.deviceType = "plainDisk"

 

scsi1:4.present = "TRUE"

scsi1:4.mode = "persistent"

scsi1:4.fileName = "D:\RACVIRTUAL\ASMDISK\ocr2.vmdk"

scsi1:4.deviceType = "plainDisk"

 

scsi1:0.present = "TRUE"

scsi1:0.mode = "persistent"

scsi1:0.fileName = "D:\RACVIRTUAL\ASMDISK\votingdisk2.vmdk"

scsi1:0.deviceType = "plainDisk"

 

 

Ethernet1.present = "TRUE"

 

Ethernet1.addressType = "generated"

ethernet1.generatedAddress = "00:0c:29:2d:f9:0d"

ethernet1.generatedAddressOffset = "10"

 

floppy0.present = "FALSE"

redoLogDir = "."

 

Ethernet0.connectionType = "bridged"

 

 

ide0:0.present = "TRUE"

ide0:0.fileName = "D:\RACVIRTUAL\ASMDISK\oradata2.vmdk"

ide0:0.deviceType = "plainDisk"

 

ide0:1.present = "TRUE"

ide0:1.fileName = "Z:\ASMDISK\oradata1.vmdk"

ide0:1.deviceType = "plainDisk"

 


Installation of Oracle Clusterware Services

 

·         Before we go on installing Oracle CRS, lets verify that our two nodes fulfill all of the pre-requisites for CRS installation. This is achieved by running a verification utility provided by oracle called “runcluvfy.bat”. On RAC1 command prompt, navigate to the CD Rom where you have already inserted Oracle 10g Enterprise CD.

Cd d:\

Cd D:\clusterware\cluvfy

runcluvfy.bat stage -pre crsinst -n rac1,rac3 –verbose

Verify that the output of the above command has only VIP verification failure and all tests should pass.

·         Execute the D:\clusterware\Setup.exe which will launch oracle installer for Cluster Services.

·         In Specify Home Details screen, set the Name as oracrs and location as I:\oracle\product\10.2.0\crs.

·         Oracle will then run a check on all of the pre-requisites and you should make sure all tests passed and then press Next.

·         In the Specify Cluster Configuration screen, you have to add the two nodes along with the all Private, Public and VIP names. For example, you will have to add RAC3 as Second Node for RAC, RAC3-priv and RAC3-vip as names of Private and VIP interfaces.

·         In the next screen of Specify Network Interface Usage, you would have edit and make sure there will be two interconnect one for Public (10.10.42.0) and second for Private (10.10.0.0) in my case.

·          In the cluster configuration Storage screen, you have to specify the two disks (main and its mirror) for both OCR and Voting disks. Here I should have specified three voting disks but I chose only two and third I shall add It later. Reason is simple that I would like to show how to change RAC configurations after installation. You should be able to reorganize the disks first with their sizes (remember we chose 4GB for data files, and 200, 300 MB for Voting and Ocr). You can also know the location of these disks by their names and verifying the names from VMware machine settings for the disks added. You will see that for OCR you will have an option like Primary OCR and mirrored OCR locations. But for Voting you have to specify its location multiple times. That means for OCR disks, one RAC node becomes the master for it (responsible for read/write to it and its mirrored). The other nodes will communicate to Master Node for ocr operations. However in voting disk, each node has to write to it separately.

·         Next oracle will begin installation. As you can see in the Installation screen, Oracle will go through the following steps:

1.   Install successful

2.   Setup Successful

3.   Remote Operation Pending

4.   Configuration Pending

5.   Oracle Clusterware configuration

6.   Oracle Notification Server configuration

7.   Oracle Private inter connect configuration

8.   Virtual Private IP configuration

·         Except step 8, all steps should be completed OK. Ignore and proceed to complete the install and exit.

·         At this point we will have the following services configured in the windows service manager

1.   Oracle Object service

2.   Oracle cluster volume service

3.   Oracle CR Service

4.   Oracle CS service

5.   Oracle EVM service

 

·         You need to now run the VIP configuration assistant as it was the one which got failed. So run it from i:\oracle\product\10.2.0\crs\bin\vipca

·         The assistant will show you screen where you will have to provide RAC1-vip and RAC2-vip network names and then proceed  to install the VIP services.

·         Basically this will install the following 3 resources (not as windows services)

1.   VIP application resource

2.   GSD Apps resource(Global Service Directory)

3.   ONS Apps resource (Oracle notification Service)

 

·         At this point your CRS installation is complete and you should verify it by running: Cluvfy.bat stage –post crsinst –n RAC1,RAC3

·         Following are the commands that can also be used to verify cluster health on both nodes:

I:\oracle\product\10.2.0\crs\bin\crs_stat –t

I:\oracle\product\10.2.0\crs\bin\crsctl check crs

I:\oracle\product\10.2.0\crs\bin\ocrcheck

·         Recycle both nodes and verify again the CRS health

 


Installation of Oracle Software/ASM/DB

 

·         Make sure that the Cluster services are up and running on both nodes. Run from CD ROM, d:\database\setup.exe.

·         Select Enterprise Edition.

·         Choose Oracle Home (different from CRS) as oradb and location as i:\oracle\product\10.2.0\db_1

·         Make sure to check mark both nodes for s/w installation

·         Make sure all pre-requisites tests are passed

·         In the Select Configuration Option, choose ‘Install Database Software only’.

·         Complete the installation until end and you should receive an errors this time.

·         Now launch oracle database configuration assistant from the Programs group (not from CD).

·         Choose ‘Oracle RAC Database’ in the Welcome screen

·         In the next screen select ‘Configure Automatic Assistant’

·         Select both RAC nodes, provide password for ASM instance and choose pfile which means each ASM instance on RAC1 and RAC3 will have its own init.ora located on NTFS.

·         DBCA will then create ASM instance, and then you will see a screen where you have to create disk groups.

·         In the create disk group, click on Stamp disks and you should see all of your raw devices (from both node’s storage). Select the two raw devices of 4GB on the two nodes and accept defaults.

·         Now you should be able to see both disks appear as candidates in the Create Disk group screen.

·         Select Redundancy as Normal and create the disk group as shown in the following picture.

 

 

 

·         The main group is DATA and two sub groups are DATAP and DATAS.

·         Press OK and continue to complete.

·         ASM instance setup is now completed and we can begin creating a database.

·         Launch DBCA from programs group and follow the screens as under.

·         Choose Create Database, select both nodes, select ‘general purpose database template’, name the database as RACDB. The two instances will be RACDB1 and RACDB2.

·         Select ASM as the storage for the new database.

·         Select DATA as the ASM group  for all database files. I will create more groups later and show how to distribute REDO, archived and Flash recovery area to their respective asm groups.

·         Use Oracle Managed files, it is always recommended to use OMF with ASM.

·         Do not specify Flash recovery area as it will be done later, but specify Archived log location.

·         Accepts defaults for rest of the screens and continue until completion.


Post RAC Installation Health Check

 

Now that the RAC is installed, lets perform basic health checks.

 

--Assignment of Environment Variables

Right click on My Computer and select properties and go to Advanced tab and define the following environment variables on both servers.

 

SET CRS_HOME = I:\oracle\product\10.2.0\crs

SET ORACLE_HOME= I:\oracle\product\10.2.0\db_1

 

--Oracle Clusterware Services

Oracle Object Service           OracleOBJService.exe

OracleCRService                 crsd.exe

OracleCSService                 ocssd.exe

OracleEVMService                     evmd.exe

OracleClusterVolumeService      OcfsFindVol.exe

 

--Oracle ASMServices

OracleASMService+ASM1           oracle.exe

 

--Oracle Database Services

OracleoradbTNSListenerLISTENER_RAC1  TNSLSNR.EXE

OracleServiceRACDB1             oracle.exe

OracleDBConsoleRACDB1           nmesrvc.exe

OracleJobSchedulerRACDB1       

 

--Link between Windows Services and Processes (in Task Manager)

Run command : TASKLIST /SVC (See the processes links above)

 

Here is a short description of each of the CRS daemon processes:

(Taken from Metalink Note: 259301.1

CRSD:

- Engine for HA operation

- Manages 'application resources'

- Starts, stops, and fails 'application resources' over

- Spawns separate 'actions' to start/stop/check application resources

- Maintains configuration profiles in the OCR (Oracle Configuration

     Repository)

- Stores current known state in the OCR.

- Runs as root

- Is restarted automatically on failure

 

OCSSD:

- OCSSD is part of RAC and Single Instance with ASM

- Provides access to node membership

- Provides group services

- Provides basic cluster locking

- Integrates with existing vendor clusterware, when present

- Can also runs without integration to vendor clusterware

- Runs as Oracle.

- Failure exit causes machine reboot. 

--- This is a feature to prevent data corruption in event of a split brain.

 

EVMD:

- Generates events when things happen

- Spawns a permanent child evmlogger

- Evmlogger, on demand, spawns children

- Scans callout directory and invokes callouts.

- Runs as Oracle.

- Restarted auto when fails

 

--CRS_STAT: Check Health of Resources(ASM, Listener,DB, Instance etc)

Cd %CRS_HOME%

Crs_stat –t

Name           Type           Target    State     Host

-----------------------------------------------------------

ora....B1.inst application    ONLINE    ONLINE    rac1

ora....B2.inst application    ONLINE    ONLINE    rac3

ora.RACDB.db   application    ONLINE    ONLINE    rac3

ora....SM1.asm application    ONLINE    ONLINE    rac1

ora....C1.lsnr application    ONLINE    ONLINE    rac1

ora.rac1.gsd   application    ONLINE    ONLINE    rac1

ora.rac1.ons   application    ONLINE    ONLINE    rac1

ora.rac1.vip   application    ONLINE    ONLINE    rac1

ora....SM2.asm application    ONLINE    ONLINE    rac3

ora....C3.lsnr application    ONLINE    ONLINE    rac3

ora.rac3.gsd   application    ONLINE    ONLINE    rac3

ora.rac3.ons   application    ONLINE    ONLINE    rac3

ora.rac3.vip   application    ONLINE    ONLINE    rac3

 

crs_stat alone will provide full names listing

 

crs_stat –f will provide detailed information about each of the compoenents.

 

--Start/stop all oracle services

Crs_start –all

Crs_stop –all

 

Start/Stop Individual services

crs_start resounce_name -c cluster_member

crs_start resource_name

 

For example: crs_start ora.RACDB.RACDB1.inst

 

Please note that you can also use srvctl command to achieve the same for starting or stopping services, and is recommended to use it as it has more control of each service group.

 

--CRSCTL : Controls RAC parameters

Checks health of cluster only

Crsctl check crs

CSS appears healthy

CRS appears healthy

EVM appears healthy

 

Query voting disks location

crsctl query css votedisk

0.     0    \\.\votedsk1

1.     0    \\.\votedsk2

 

 

 

Checks version of Clusterware

crsctl query crs softwareversion/activeversion

CRS software version on node [rac1] is [10.2.0.1.0]

 

You can also use the utility to find out location of ocr and voting disk as :

Run I:\oracle\product\10.2.0\crs\BIN\GUIOracleOBJManager.exe

 

 

 

Dumps cluster state to crsd.log

crsctl debug statedump crs

 

Debug specific components(level 2) See crsd.log

crsctl debug log "CRSTIMER:2"

 

Please note that dumping cluster state is a one time snapshop while other debig command are modes of tracing with different levels.

You should see a document for understanding how to debug a real application cluster environment. See details at http://download-uk.oracle.com/docs/cd/B19306_01/rac.102/b14197/appsupport.htm

 

See Appendix at the end of this document for more details.

 

-- Displays health of Oracle Cluster Registry

Ocrcheck

Status of Oracle Cluster Registry is as follows :

Version                  :          2

Total space (kbytes)     :     192652

Used space (kbytes)      :       3800

Available space (kbytes) :     188852

ID                       : 1953799442

Device/File Name         : \\.\ocrcfg

Device/File integrity check succeeded

Device/File Name         : \\.\ocrmirrorcfg

Device/File integrity check succeeded

Cluster registry integrity check succeeded

Make sure to check the log:

I:\oracle\product\10.2.0\db_1\log\rac3\client

--Export ocr (takes backup and restore and change location)

ocrconfig -export ocr.dmp -s online

ocrconfig -import ocr.dmp

oracle performs 4hr backup at cdata folder under CRS_HOME but only on master node.

You can use the command ocrconfig –showbackup to see existing backups.

 

ocrconfig -replace ocrmirror <new location>

 

ocrconfig -restore ocrbackup

 

Ocr backup is automatically taken every 4-hours on the master node. Please make sure to keep a copy of the backup files.

 

 

ocrconfig -repair ocr <ocr_location>

 

ocrdump <file-name>  (dumps ascii format)

 

Manage Cluster Database srvctl command

 

srvctl <commanD> <OBJECT> [<OPTIONS>]

 

I will explain this with an exmaple. Let us suppose we need to stop all rac resources (not the windows services like CSS, CRS and EVM).

 

Recycle RAC Environment

Step1: Stops agent processes

SET ORACLE_SID=RACDB1

emctl status agent

emctl status dbconsole

emctl stop dbconsole

 

Repeat the same on RAC3 and then run emctl status to verify.

 

Step2:Stop database with its instances(all)

srvctl stop database -d RACDB

 

Step3:stop all asm instances

srvctl stop asm -n RAC1

srvctl stop asm -n RAC3


Step4:stop VIP,GSD, ONS SERVICES

srvctl stop nodeapps -n RAC1

srvctl stop nodeapps -n RAC3

 

Step5: START VIP,GSD, ONS AND LISTENER SERVICES

srvctl start nodeapps -n RAC1

srvctl start nodeapps -n RAC3

 

Step6: starts asm instances

srvctl start asm -n RAC1

srvctl start asm -n RAC3

 

Step7: starts db +instances

srvctl start database -d RACDB

 

Step8: dbconsole and agent startup

set ORACLE_SID=RACDB1

emctl stART dbconsole

Here you need to repeat this step on RAC3 as well.

 

Make sure RAC is up by crs_stat –t command.

 

--Accesing RAC Environment from EM console.

Make sure agent is up and then open IExplorer:

http://RAC1:1158/em where RAC1 is the dbconsole node.

 

I would highly recommend DBAs to get familiar with EM and it is an excellent GUI tool to monitor and manage your RAC environment. What ever actions you need to perform you can also see the corresponding SQL that will be run.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Applying Oracle 10.2.0.3 Patch set

 

Now that the RAC is installed on the two nodes; RAC1, RAC3 and we are ready to add a third node as RAC5. I would recommend to patch the two RAC nodes with latest oracle patch for both Cluster layer and Database (asm inclusive).

 

Major steps in applying the patch.

 

·         On both nodes, make sure all services/components of RAC are down. Applying oracle patch has a pre requisite that all oracle services are down.

·         Unzip and copy the patch contents to RAC1 node. Create a folder as C:\10203Patch and use that as the patch contents unzip folder. Run c:\10203Patch\Setup.exe.

·         Select ORACRS as your first home to be patches which is the clusterware stack.

·         On the next screen make sure both RAC Nodes are checked and proceed to complete the installation.

·         After the installation is over, on each of the two nodes, run the following from command prompt:

I:\oracle\product\10.2.0\crs\install\patch102.bat

·         At this point, you should notice in Windows service manager that Cluster related services will be up and running. You can verify that the Cluster layer is patched with 10.2.0.3 by running the following command:

crsctl query crs softwareversion

crsctl query crs activeversion

·         Stop all cluster services once again. Now run the setup.exe again and this time choose oradb_1 home to patch the oracle asm and oracle database software. Complete the patch installation. During db home patching, you may receive errors like file in use, you should go to the file folder from windows explorer and remove that file and then retry the operation.

·         After installation is over, you need to run the following on remote node (RAC3) :

You need to execute <Oracle Home>\bin\SelectHome.bat on remote nodes to activate the following products:

Oracle Data Provider for .NET

Oracle Provider for OLE DB

Oracle Objects for OLE

Oracle Counters for Windows Performance Monitor

Oracle Administration Assistant

·         After this point, make all RAC services on windows as Manual start except two services; Oracle Object Service  and OracleClusterVolumeService, and bounce both nodes. Once up start the services in the following order:

§         OracleCSService

§         OracleEVMService

§         OracleCRService

·         Make sure database services are down otherwise run the following command:

Srvctl stop database –d RACDB

·         Now you are ready to run the catupgrade against the data dictionary as part of the last step in patching database. However you need to make sure SGA components (shared pool and java pool should be at least 150 mb each). Since am using Automatic SGA memory management, I will increase the SGA_TARGET from 150 to 300 MB for the instance, you can revert it back to 150 after the patch is deployed.

·         From RAC1 node, log on to sqlplus after setting the ORACLE_SID=RACDB1. Save the pfile initRACDB1.ora (located at local node location %ORACLE_HOME%/database to another file name. From sqlplus run create pfile from spfile; and then open the initRACDB1.ora, change parameter values and then from sqlplus run create spfile=SPFILE='+DATA/RACDB/spfileRACDB.ora' from pfile; Then revert back the saved init file to original initRACDB1.ora so that it will only contain the following line:

SPFILE='+DATA/RACDB/spfileRACDB.ora'

·         Also remove the local file spfileracdb1.ora from database folder as it is not required.

·         Now startup the instance as mount; turn archive off by running alter database noarchivelog; and also runalter system set cluster_database=FALSE scope=spfile; thenshutdown the instance.

·         Again startup the instance as :

1.   startup upgrade

2.   spool patch10203.log

3.   @I:\oracle\product\10.2.0\db_1\RDBMS\ADMIN\catupgrd.sql

·         Review the log file for any errors and make sure all database compoenents are showd updated with 10.2.0.3 patch set as:

·          Component                                Status         Version  HH:MM:SS

·          Oracle Database Server                    VALID      10.2.0.3.0  00:19:33

·          JServer JAVA Virtual Machine              VALID      10.2.0.3.0  00:06:23

·          Oracle XDK                                VALID      10.2.0.3.0  00:01:40

·          Oracle Database Java Packages             VALID      10.2.0.3.0  00:01:03

·          Oracle Text                               VALID      10.2.0.3.0  00:00:33

·          Oracle XML Database                       VALID      10.2.0.3.0  00:01:37

·          Oracle Real Application Clusters          VALID      10.2.0.3.0  00:00:02

·          Oracle Data Mining                        VALID      10.2.0.3.0  00:00:34

·          OLAP Analytic Workspace                   VALID      10.2.0.3.0  00:00:59

·          OLAP Catalog                              VALID      10.2.0.3.0  00:01:41

·          Oracle OLAP API                           VALID      10.2.0.3.0  00:01:20

·          Oracle interMedia                         VALID      10.2.0.3.0  00:08:31

·          Spatial                                   VALID      10.2.0.3.0  00:06:41

·          Oracle Expression Filter                  VALID      10.2.0.3.0  00:00:30

·          Oracle Enterprise Manager                 VALID      10.2.0.3.0  00:02:32

·         Oracle Rule Manager                       VALID      10.2.0.3.0  00:00:13

·         RUN UTLRP.SQL TO COMPILE ALLINVALID OBJECTS (ELSE THEY BE VALID WHEN ACCESSED)

·         alter system set cluster_database=TRUE scope=spfile;

·         SHUTDOWN

·         STARTUP MOUNT

·         ALTER DATABASE ARCHIVELOG;

·         SHUTDOWN

·         STARTUP

·         srvctl start database -d RACDB

·         crs_stat –t should now show databases instances to be up and running.

At this point you have successfully deployed oracle 10.2.0.3 patch set.

 

 

 

 

 

 

 


Adding a third Node to the Cluster Database

 

Now that the RAC is installed on the two nodes; RAC1, RAC3, we are ready to create RAC5 as third node. RAC5 has already been created as a virtual machine on workstation XPWS3.

RAC5 needs to be configured with the following parameters:

·         IP addresses assigned and also need to be replicated to the host file of remaining two nodes, while the IP addressed of the existing two nodes need to be copied to the host file of RAC5: RAC5 will have the following IP addresses:

10.10.42.73     RAC5

10.10.0.73 RAC5-PRIV

10.10.42.252    RAC5-VIP

·         Map network drives on XPWS3 as Y and W to point to ASM folders of RAC1 and RAC3 with full permission.

·         Following is the excerpt from RAC5 OS winNetEnterprise.vmx:

disk.locking = "FALSE"

diskLib.dataCacheMaxSize = "0"

diskLib.dataCacheMaxReadAheadSize = "0"

diskLib.dataCacheMinReadAheadSize = "0"

diskLib.dataCachePageSize = "4096"

diskLib.maxUnsyncedWrites = "0"

 

scsi1.present = "TRUE"

scsi1.virtualDev = "lsilogic"

scsi1.sharedBus = "VIRTUAL"

 

 

config.version = "7"

virtualHW.version = "3"

scsi0.present = "TRUE"

scsi0.virtualDev = "lsilogic"

memsize = "540"

scsi0:0.present = "TRUE"

scsi0:0.fileName = "Windows Server 2003 Enterprise Edition.vmdk"

ide1:0.present = "FALSE"

ide1:0.fileName = "auto detect"

ide1:0.deviceType = "cdrom-raw"

floppy0.fileName = "A:"

Ethernet0.present = "TRUE"

sound.present = "TRUE"

sound.fileName = "-1"

displayName = "rac5"

guestOS = "winNetEnterprise"

priority.grabbed = "normal"

priority.ungrabbed = "normal"

powerType.powerOff = "default"

powerType.powerOn = "default"

powerType.suspend = "default"

powerType.reset = "default"

 

ide1:0.startConnected = "TRUE"

Ethernet0.addressType = "generated"

uuid.location = "56 4d 49 04 57 26 bc 40-3f 27 76 e5 1c 6a 0b 18"

uuid.bios = "56 4d 49 04 57 26 bc 40-3f 27 76 e5 1c 6a 0b 18"

ethernet0.generatedAddress = "00:0c:29:6a:0b:18"

ethernet0.generatedAddressOffset = "0"

tools.syncTime = "TRUE"

 

scsi0:1.present = "TRUE"

scsi0:1.fileName = "Windows Server 2003 Enterprise Edition (3).vmdk"

sound.virtualDev = "es1371"

 

Ethernet1.present = "TRUE"

 

Ethernet1.addressType = "generated"

ethernet1.generatedAddress = "00:0c:29:6a:0b:22"

ethernet1.generatedAddressOffset = "10"

scsi0:2.present = "TRUE"

scsi0:2.fileName = "Y:\ASMDISK\ocr1.vmdk"

 

scsi0:3.present = "TRUE"

scsi0:3.fileName = "Y:\ASMDISK\votingdisk1.vmdk"

scsi1:0.present = "TRUE"

scsi1:0.fileName = "W:\ASMDISK\ocr2.vmdk"

scsi1:1.present = "TRUE"

scsi1:1.fileName = "W:\ASMDISK\votingdisk2.vmdk"

 

floppy0.present = "FALSE"

 

ide0:0.present = "TRUE"

ide0:0.fileName = "Y:\ASMDISK\oradata1.vmdk"

ide0:0.deviceType = "plainDisk"

 

ide0:1.present = "TRUE"

ide0:1.fileName = "W:\ASMDISK\oradata2.vmdk"

ide0:1.deviceType = "plainDisk"

·         Now we are ready to add RAC Node2 as RACDB3 to Server RAC5.

 

Run the following commands from Existing Node RAC1:

 

·         cluvfy comp peer -refnode rac1 -n rac5 (Compare)

·         Install Clusterware stack software on RAC5 from RAC1 as:

cd cd I:\oracle\product\10.2.0\crs\oui\BIN

addnode.bat

 

·         Press Next to the welcome screen and provide public and private IP address of the new new node and complete the installation.

·         The above proc will install I:\oracle\product\10.2.0\crs on RAC5 and also cluster services but will not start cluster sevices (except first 2 obj serv and cluster volume)

·         cd I:\oracle\product\10.2.0\crs\install\

·         I:\oracle\product\10.2.0\crs\install>crssetup.add.bat

·         You should recive the following messages and make sure there are no errors even for VIP services.

Step 1:  checking status of CRS stack

Step 2:  Configuring basic cluster services

Step 3:  configuring OCR repository with new nodes

clscfg: EXISTING configuration version 3 detected.

clscfg: version 3 is 10G Release 2.

Attempting to add 1 new nodes to the configuration

Using ports: CSS=49895 CRS=49896 EVMC=49898 and EVMR=49897.

node <nodenumber>: <nodename> <private interconnect name> <hostname>

node 3: rac5 rac5-priv rac5

Creating OCR keys for user 'administrator', privgrp ''..

Operation successful.

Step 4:  configuring safe mode for CRS components

Step 5:  starting up the CRS stack on new nodes

Step 6:  configuring OCR with new node VIP information

 

Creating VIP application resource on (1) nodes..

Creating GSD application resource on (1) nodes..

Creating ONS application resource on (1) nodes..

Starting VIP application resource on (1) nodes..

Starting GSD application resource on (1) nodes..

Starting ONS application resource on (1) nodes..

·         At this point all cluster services on the new node RAC5 should be aytomatically started and this marks the end of Cluster stack installation for the new node.

·         If you run crs_stat –t from Nod3 (RAC5) you should see gsd, ons and vip services up and running.

·         Now you are ready to install oracle software on the new node.

Cd I:\oracle\product\10.2.0\db_1\oui\bin

addnode.bat

Complete install process.

·         Go to RAC5 node and run network config assistance and create the listener with default settings. Make sure listener creation is only for RAC5 node. Run crs_stat –t and you should see listener component also apearing besides ons, vip and gsd services.

Now you can go back to RAC1 node and run DBCA GUI tool and follow the screens to add RACDB3 instance on RAC5 Node (third node). However I always prefer manual approach which is explained below:

Perform the following steps from RAC5 node.

·         Creating ASM instance on RAC5 Node:

Create admin folder under I:\oracle\product\10.2.0

Copy all the contents of admin folder from RAC1.

You now have two subfolders as +ASM and RACDB.

Modify init+asm3.ora to include      +ASM3.instance_number=3

copy +ASM3.instance_number=3 to init.ora of other instances, this is not required but in case later you would like to create spfile for asm.

Go to command prompt of ORACLE_HOME/database

set ORACLE_SID=+ASM3

orapwd file=PWD+ASM3.ora password=password

oradim -new -ASMSID +ASM3 (create windows service)

make sure asmtoolg shows the oradata asm group disks

via sqlplus  mount instance  as startup.

You should now see asm instance running and you can verify by running command select * from v$asm_diskgroup

However you need to add the asm service to cluster stack, so shutdown asm instance and go to crshome/bin and run the following command:

     srvctl add asm -n RAC5 -i +ASM3 -o %ORACLE_HOME%

srvctl start asm -n rac5

     Now execute srvctl start asm –n RAC5 and the ASM will be

     started.

 

·         Creating Database instance on RAC5 Node:

set ORACLE_SID=RACDB3

orapwd file=PWDRACDB3.ORA password=password

create pfile from spfile and edit the contents of the initRACDB3.ora as shown below and then copy it back to spfile;

 

racdb1.__db_cache_size=201326592

racdb2.__db_cache_size=159383552

racdb3.__db_cache_size=159383552

racdb1.__java_pool_size=4194304

racdb2.__java_pool_size=4194304

racdb3.__java_pool_size=4194304

racdb1.__large_pool_size=4194304

racdb2.__large_pool_size=4194304

racdb3.__large_pool_size=4194304

racdb1.__shared_pool_size=121634816

racdb2.__shared_pool_size=96468992

racdb3.__shared_pool_size=96468992

racdb1.__streams_pool_size=0

racdb2.__streams_pool_size=0

racdb3.__streams_pool_size=0

*.cluster_database_instances=3

*.audit_file_dest='I:\oracle\product\10.2.0/admin/RACDB/adump'

*.background_dump_dest='I:\oracle\product\10.2.0/admin/RACDB/bdump'

*.cluster_database=TRUE

*.compatible='10.2.0.1.0'

*.control_files='+DATA/racdb/controlfile/current.260.626900241'

*.core_dump_dest='I:\oracle\product\10.2.0/admin/RACDB/cdump'

*.db_block_size=8192

*.db_create_file_dest='+DATA'

*.db_domain=''

*.db_file_multiblock_read_count=16

*.db_name='RACDB'

*.dispatchers='(PROTOCOL=TCP) (SERVICE=RACDBXDB)'

RACDB3.instance_number=3

RACDB2.instance_number=2

RACDB1.instance_number=1

*.job_queue_processes=10

*.open_cursors=300

*.pga_aggregate_target=16777216

*.processes=150

*.remote_listener='LISTENERS_RACDB'

*.remote_login_passwordfile='exclusive'

*.sga_target=268435456

RACDB3.thread=3

RACDB2.thread=2

RACDB1.thread=1

*.undo_management='AUTO'

RACDB3.undo_tablespace='UNDOTBS3'

RACDB2.undo_tablespace='UNDOTBS2'

RACDB1.undo_tablespace='UNDOTBS1'

*.user_dump_dest='I:\oracle\product\10.2.0/admin/RACDB/udump'

 

Now create oracle db sid oradim -NEW -SID RACDB3

 

startup pfile=initRACDB3.ora nomount;

alter database mount;

ORA-01618: redo thread 3 is not enabled - cannot mount

 

To overcome this message you need to create the redo log for the new node from RAC1 as:

alter database add logfile thread 3 group 5;

alter database add logfile thread 3 group 6;

alter database enable public thread 3;

 

Also create undo tablespace as:

     create undo tablespace UNDPTBS3;

 

Add new db instance to the node:

adds instance

srvctl add instance -d RACDB -i RACDB3 -n RAC5

 

Now shudown the database as srvctl stop database –d RACDB

Now start the database as srvctl start database –d RACDB

 

At this point all three instances are up and running and if you encounter issues like cluster_database_instances value is not in sync for any of the instance, then that means it was not started with the proper spfile which has value set as cluster_database_instances=3.


RAC Concepts Primer

 

Oracle Clusterware

 

With 10g, You do not necessarily need a third party cluster software for RAC implementation as Oracle Clusterware provides the clustering support. Oracle Clusterware software enables RAC nodes to communicate with each other and work as single logical RAC server.

 

Oracle Cluster Registry (OCR)

OCR maintains RAC application resources and availability. It is created on a shared storage accessible to all Nodes.

Oracle Clusterware reads the ocr.loc(on Unix) or registry values(on windows), for the location of the ocr file, and to find out which resources need to be started on RAC Nodes after reading OCR file contents.

Each RAC node maintains a copy of the OCR in memory. It is important to note that Only one OCR process (designated as the master) in the cluster performs any disk I/O activity. Once information is read by this master OCR process, it is then replicated from local OCR cache to the OCR cache on other nodes in the cluster.

The OCR file contains information for all of cluster layers. The layers include System, Database, and CRS. The information relating to System includes CSS, EVM, CRS, ORA_CRS_HOME etc.

 

Cluster Synchronization Services (CSS)

CSS maintains membership of each RAC Nodes in the cluster through voting disk which is also stored in shared storage subsystem. This is the first process that is started in the Oracle Clusterware stack.

 

CSS performs the following:

 

1.   Oracle Clusterware determines the location of the OCR from the ocr.loc file during the system startup. It then reads the OCR file to determine the location of the voting disk.

2.   The vote disk is required to determine the names/numbers of members in the cluster.

3.   CSS then bring voting disks online.

4.   CSS then establishes a connection to all RAC nodes using private interconnect.

5.   Once the connection is established between the various RAC Nodes listeners, these nodes are changed to ACTIVE status if the node(s) is able to access voting disk(s).

6.   CSS authorizes the first node that attains the ACTIVE state as the MASTER node unless a MASTER node is already assigned.

7.   All of the ACTIVE RAC nodes then register themselves with the MASTER node.

8.   Finally, a new incarnation of the cluster is established.

 

Oracle Clusterware Stack

The main processes that compose the Oracle Clusterware stack are:

1.   Cluster Synchronization Service(CSS)

2.   Event Manager Service (EVMD)

3.   Cluster Ready Service (CRSD)

4.   RACGIMON process

5.   PROCD process.

1.   Cluster Synchronization Service Daemon:

Cluster Synchronization Service Daemon (CSSD) is responsible for synchronization between the various resources in the cluster. A failure of this process will cause the relevant RAC node to reboot. These services are performed by the Node Membership (NM) and the Group Membership (GM) services.

Node Membership Service (NM) has the following role:

o        Check the heartbeat across RAC Nodes every second

o        Check the heartbeat of the disk by performing a read/write operation every second

o        If the heartbeat fails to receive for more than 60 sec, Master Node will evict the problematic node from cluster.

o        Query voting disk to determine if any RAC node is not able to write to it.

The GM provides membership services. All clients that perform I/O operations register with the GM (e.g., LMON, DBWR). Reconfiguration of instances (when an instance joins or leaves the cluster) is also handled by GM. When a node fails, the GM sends out messages to other instances regarding the status, so it also acts as gateway for messages.

 

2.   Event Manager Daemon (EVMD)

The EVMD is an event-forwarding process that sends events through the Oracle Notification Service (ONS). All communications between the CRS and CSS happen via this process.

 

3.   Cluster Ready Service Daemon (CRSD)

The CRSD process is used to define and manage resources. Resources have profiles that define metadata about them in OCR. This process manages the application resources i.e. start, stop, and manage failover. If the daemon fails, it will automatically starts. The OCR information is cached inside CRS. Moreover this process also starts and communicates with the RACGIMON process.

Resources that are managed by the CRS include:

     Global Service Daemon (GSD),

     ONS Daemon,

     Virtual Internet Protocol (VIP),

     Listeners,

     Databases,

     Instances,

     3rd party Services.

 

4.   RACGIMON Daemon

RACGIMON is a database health check process monitor, and also performs the tasks of starting, stopping, and failover services. when the node that houses it fails, the RACGIMON process is started on the MASTER node of the surviving nodes by the CRS process.

 

5.   PROCD Process

PROCD is also a process monitor that runs on hardware platform supporting other third-party cluster managers and is present only on hardware platforms other than Linux like it is present on AIX OS machines.

 

Additional Notes:

The voting disk is a shared disk that will be accessed by all the nodes used as a central reference, keeps the heartbeat information between the nodes. If any of the nodes is unable to access the voting disk, the cluster immediately recognizes the communication failure and Master node starts evicting the failed node from the cluster group to prevent data corruptions. You should always have three voting disks on different locations to avoid split brain issue which can result in corruption.

 

Virtual IP is required to ensure that applications can be work to be high available. Database listeners are configured to listen on VIPs addresses instead of the public ones. When a node goes down, the client connection will be rejected by the that node, however its VIP resource will be failed over to another existing node and there will be no TCP timeout whereas the clients will be connected to the RAC.

 

Cluster Interconnect is a communication network used by the cluster nodes for the synchronization of resources and is also used to transfer instance-specific data from one instance to another. The network layer should be dedicated to the RAC and has high bandwidth with low latency.

 

Cache fusion uses high-speed interprocess communication ntwork for cache-to-cache transfer of data blocks between RAC instances. It addresses transaction concurrency between instances.

 

RAC Background Processes

RAC Instance will have the usual background processes that a single non-RAC instance has plus additional processes specifically required for the RAC environment.

 

1.   LMS (Lock Manager Service): Global Cache Services Process

LMS is the process used in Cache Fusion, functions are:

·         Enables consistent copies of blocks to be transferred between instances.

·         Rollback uncommitted transactions for blocks that are being requested for consistent read by the remote instance.

·         Number of LMS processes running is driven by GCS_SERVER_PROCESSES parameter say for example ora_lms0..ora_lms9

 

2.   LMON (Lock Monitor): Global Enqueue Services Monitor

LMON Process is a monitor process which manages:

·         Instance deaths and associated recovery for the failed node

·         Cluster/Locks reconfiguration when a new instance joins or existing instance gets evicted from the RAC

·         Maintains consistence among GCS memory in case any LMSx dies.

 

3.   LMD (Lock Manager Daemon): Global Enqueue Services Daemon

It is a process responsible for:

·         Managing requests for resources and controls access to blocks and global Enqueues

·         Handling global deadlock detection and remote resource requests.

 

4.   LCK: Lock Process

Primary function is to manage non-cache fusion resource requests such as library, row cache, and lock requests that are local to the instance.

 

5.   DIAG: Diagnostic Daemon

·         Monitors health of the RAC instances and captures diagnostic data regarding process failures in an instance.

·         Note that PMON restarts a new DIAG process to continue its service in case DIAG process dies.

 

Additional Notes:

The GCS and GES processes on each RAC-Node manage the cache synchronization by using the cluster interconnect network layer.

 

In a clustered database environment, there will exists different scenarios of block sharing which can be categorized as follows:

 

·         Concurrent Reads on multiple nodes occurs when two ore more instances are required to read the same block of data.

 

·         Concurrent Reads and Writes on different nodes is a combination of I/O operations for a single block of data. A block available on any of the instances is modified by a another instance while maintaining a different copy of data.

 

·         Concurrent Writes on different nodes occurs where multiple instances want to  change the same data block frequently.

 

 

 

 

 


 

Troubleshooting RAC Environment

 

Now that you created a three node RAC with storage extended from RAC1 to RAC3 with normal redundancy, we are ready to create a third voting disk. But before we do that, I would like to share some of the issues faced and the methods to resolve. It may be possible that these issues will not arise on a stable environment like AIX/HP over SAN storage, but having the RAC tested over VMWare/Windows has its benefits in terms of troubleshooting.

On Windows environment, you have the following services for each RAC Node. Please start services in the following order.

Oracle Object service      (Keep Auto Start)

Oracle cluster volume service (Keep Auto Start)

Oracle CS service          (Keep Auto Start)

Oracle EVM service

Oracle CR Service

 

Some times you will not be able to start EVM service, please make sure that the CS service is started on all nodes.

 

 

1.   Creation of OCR mirror:

ocrconfig -replace ocrmirror \\.\ocrcfg

 

2.   Cluster components/services not starting.

Some times you receive a message that the <name> resource is not registered with the cluster and although you are able to see the resource when you type crs_stat –t.

I had this problem with the database resources and the instances, so I did the following to resolve:

 

srvctl remove database -d RACDB

(this will move db resource and also all instances registered)

Crs_unregister ora.RACDB.db

srvctl add database -d RACDB -o %ORACLE_HOME%

srvctl add instance -d RACDB -i RACDB1 -n RAC1

srvctl add instance -d RACDB -i RACDB2 -n RAC1

srvctl start database -d RACDB

 

You can also remove a particular instance by running the command: srvctl remove instance -d RACDB -i RACDB1

 

3.   On windows some times you are not able to start the CRSD service and in the crsd.log file you will notice network timeout ora-errors, you need to add this parameter in sqlnet.ora to have the timout increase from 10 secs.

sqlnet.inbound_connect_timeout=600

4.   Miscellenous cluster commands:

srvctl start instance -d RACDB -i RACDB2

srvctl status instance -d RACDB -i RACDB2

srvctl status database -d RACDB

crs_stat -t –v

srvctl add asm -n RAC5 -i +ASM3 -o %ORACLE_HOME%

srvctl start asm -n rac5

ocrcheck

--starts specific resourse

Crs_start <resource> -c <member>

Crs_start ora.oradb.RACDB RAC3

 

5.   OCR Corrupted when starting crsd service

After I added the 3rd node and starting crsd service which failed with a message in crsd.log: Incorrect SV stored in OCR. Key [SYSTEM.version.node_numbers.node3] Value []

I used ocrdump ocr.txt and opened the file in text editor and found out that the value of SYSTEM.version.node_numbers.node3 should be 10.2.0.3.0 like for other nodes.

What I did was exported ocr using ocrconfig –export ocr.dmp. Opened the ocr.dmp file in hex editor and add the values. Later I imported back using ocrconfig –import ocr.dmp and it worked.

This method however is not supported by oracle, and the only way out was to recreate OCR or re-install RAC5 node.

 

6.   RAC Logs

While investigating various problems, you should be familiar with the following log files.

 

RAC Node Alert Log

Location:  I:\oracle\product\10.2.0\crs\log\rac3\alertrac3.log

This log basically logs status information for the entire cluster. For example when you start cluster services, it will display status for voting disks being brought online. You can also find information about all cluster members being active, OCR information when it is configured for changes like when upgrading ocr etc. You will not find details information about individual cluster components, however executive information is logged here.

 

Cluster Services Log

Location:  I:\oracle\product\10.2.0\crs\log\rac3\client

Here you can find log files like cssn.log which displays client information for any missing entries in the registry. For example when your mirrored ocr gets corrupted, you will see a log mentioning about not able to find the corresponding location. For example:

2007-07-16 10:43:47.784: [  OCROSD][2744]utgdv:11:could not read reg value ocrmirrorconfig_loc os error= The system could not find the environment option that was entered.

Location:  I:\oracle\product\10.2.0\crs\log\rac3\crsd

This is the most important log of all(cluster ready services) depending on the level of debug mode, you should be able to find all relevant information of cluster resources (asm, database, listener, ons etc) as to why they failed to start and continuous running log for all failures. When you start crsd service, you should see this log for any warnings or errors.

Location:  I:\oracle\product\10.2.0\crs\log\rac3\css

Cluster stack log for the CSS service which gets started before CRSD service.

Location:  I:\oracle\product\10.2.0\crs\log\rac3\evmd

Cluster event management log for the EVMD service which gets started after CSS but before CRSD.

Location:  I:\oracle\product\10.2.0\crs\log\rac3\racg

This folder has log files for VIP service (even for other nodes when they failed over) as well as the main database service log ora.RACDB.db.log  which controls all RAC instances for High availability and monitoring.

 

Location:  I:\oracle\product\10.2.0\crs\BIN\OOBJService.log

This is the log for the first windows service Oracle Object Service which gets started and is responsible to links to storage management (ocr disk, voting etc). You should look in this log (even if the windows service gets started fine) for any errors relating to accessing the shared storage from a node.

 

You can control the information being logged with various trace levels.

 

crsctl debug statedump crs will dump status of crsd

 

Suppose you want to debug specific modules for a service, first this you should do is to find out all of the modules related to a service. For example;

 

crsctl lsmodules css will list:

     CSSD

COMMCRS

COMMNS

 

crsctl lsmodules crs will list:

     CRSUI

CRSCOMM

CRSRTI

CRSMAIN

CRSPLACE

CRSAPP

CRSRES

CRSCOMM

CRSOCR

CRSTIMER

CRSEVT

CRSD

CLUCLS

CSSCLNT

COMMCRS

COMMNS

 

crsctl lsmodules evm will list:

EVMD

EVMDMAIN

EVMCOMM

EVMEVT

EVMAPP

EVMAGENT

CRSOCR

CLUCLS

CSSCLNT

COMMCRS

COMMNS

Now suppose you want to debug css modules for level 5(detailed info):

crsctl debug log css CSSD:5,COMMCRS:5,COMMNS:5

 

crsctl debug log crs "CRSRTI:1,CRSCOMM:2"

 

crsctl debug log res ora.RACDB.db:5

crsctl debug log res ora.RACDB.RACDB1.inst:5

 

ocr looging:

uncomment:I:\oracle\product\10.2.0\crs\srvm\admin\ocrlog.ini

 

Database Services Log

Location:  I:\oracle\product\10.2.0\db_1\RDBMS\log\ ipcdbg.racdb2.log

Here you can find information related to Cache fusion communication channel over private interface card. Look here when there is an issue between nodes for private interface channel.

 

Location: 

I:\oracle\product\10.2.0\db_1\log\rac3\racg\imon.log

I:\oracle\product\10.2.0\db_1\log\rac3\racg\imon_RACDB.log

Instance monitor/RACGIMON logs

 

Location:  I:\oracle\product\10.2.0\db_1\log\rac3\*

This location holds several logs and its worthwhile to look here when there is an issue with cluster database. For example you find here a log about ocr not being able to initialized as file name: I:\oracle\product\10.2.0\db_1\log\rac3\\client\ocrconfig_1064.log and ocrcheck_600.log. Which holds useful information everytime you run ocrcheck utility.

 

Backup and Recovery – RAC Environment

Here I will discuss about backing up RAC environment including all of its components; Cluster and Database. As far as OS and Oracle software layer is concerned, you should have a cold backup for the OS System backup which should include OS and Oracle software mount points.

 

Backup and Recovery for clustered database

 

You can always use an export method to backup the database or specific schema, however this does not differ from single instance to RAC and I would not consider export to replace the standard backup procedures. Export is always used when your requirements are more closer to the application level for specific objects. Therefore do not consider export as your backup strategy for RAC, or even for a single instance Non-RAC database.

 

RMAN should always be The Choice when considering backup strategies for a RAC Database environment.

 

There are many articles available on Metalink that talk about RAC Backup and Recovery Procedures/commands. My viewpoint is that a DBA needs to be more aware about Backup and Recovery concepts in a RAC environment rather than the actual commands difference. I have always maintained RAC Databases in such a away that I did not have to issue different backup/recovery commands for single instance vs. RAC. The key here is that you should always define Archived Log location in the shared storage (where rest of the data files reside). Backup commands have specific switches when your archived logs are backed up locally on each node.

 

For example, take a look at the following test case where archived logs are defined at a shared storage accessible to both nodes.

 

1.               Create a folder in shared storage to hold your backup sets.

set ORACLE_SID=+ASM1

set ORACLE_HOME =I:\oracle\product\10.2.0\db_1

asmcmd -p

cd DATA

cd RACDB

mkdir BACKUP

‘+DATA/RACDB/BACKUP’ is the shared backup location unless you are using a Tape device, then you should make sure to use a MML like Veritas and have the backup registered in Veritas as well as RMAN repository.

 

2.               Point your archived logs to be created at shared storage.

alter system set log_archive_dest='+DATA' scope=both

 

3.               Take full database backup

run

{

change archivelog all crosscheck;

backup archivelog all format = 'i:\oracle\archbkup%u' delete input;

}

run

{

delete obsolete;

}

configure CONTROLFILE AUTOBACKUP on;

run

{

backup as compressed backupset database format = '+DATA/RACDB/BACKUP/FULLB1%u';

}

RMAN> list backup of database;

 

4.               Create test data

show parameter db_create_file_dest

NAME                                 TYPE        VALUE

------------------------------------ ----------- -----------

db_create_file_dest                  string      +DATA

drop tablespace tbs_test1 including contents and datafiles;

CREATE TABLESPACE tbs_test1 DATAFILE SIZE 20M;

drop user user1;

create user user1 identified by user1;

alter user user1 default tablespace tbs_test1;

create table dept (id number)

--insert some values

Now from both nodes switch logfiles;

Note down archived logs created.

SELECT name,thread#,sequence#, completion_time FROM gV$ARCHIVED_LOG

where completion_time > '21-JUL-2007 12:00:31'

and name is not null

ORDER BY SEQUENCE# DESC

 

5.               Simulate Crash

Shutdown database (only on windows)

From ASMCMD, remove the datafile for the tbs_test1 tablespace.

Startup database in mount state;

alter database datafile 7 offline;

alter database open;

select * from v$recover_file

exit;

rman target /

restore datafile 7;

recover datafile 7;

sql 'alter database datafile 7 online';

 

As you can see in the above example, we did not have to specify any extra commands for the RAC environment because our archived logs are located in the common storage.

 

However if you decide to put archived logs locally, these are the changes to be aware of:

 

run { allocate channel d1 type disk connect 'sys/rac@node1'; allocate channel d2 type disk connect 'sys/rac@node2';

 

Or

CONFIGURE CHANNEL 1 DEVICE TYPE DISK connect 'SYS/rac@node1';CONFIGURE CHANNEL 2 DEVICE TYPE DISK connect 'SYS/rac@node2';….Now you run commands as normal and those will have the scope for the relevant node specified above. There is an excellent note on Metalink that deals with issues relating to recovery scenarios for RAC environments.Note:207059.1 and Note:220970.1. Document 207059.1 is an old one that deals with 9i and Parallel server but it does clear some concepts about raw devices. 

Backup and Recovery for OCR & VOTING disks

 

OCR Backup and Recovery

Reference: Metalink Note: 220970.1

OCR raw device/file gets backed up every four hours on the master RAC node at the default location:    $ORA_CRS_HOME\cdata\"clustername"\

To display backups :       ocrconfig -showbackup

To restore a backup :      ocrconfig -restore

The automatic backup mechanism keeps about a week old copy.

If you want to take a logical copy of OCR at any time use :

Ocrconfig -export , and use -import option to restore the contents back.

 

OCR is the Oracle Cluster Registry, it holds all the cluster related information such as instances, services. The OCR file format is binary and starting with 10.2 it is possible to mirror it. Location of file(s) is located in: /etc/oracle/ocr.loc in ocrconfig_loc and ocrmirrorconfig_loc variables.

 

Obviously if you only have one copy of the OCR and it is lost or corrupt then you must restore a recent backup, see ocrconfig utility for details, specifically -showbackup and -restore flags. Until a valid backup is restored the Oracle Clusterware will not startup due to the corrupt/missing OCR file.

 

The interesting discussion is what happens if you have the OCR mirrored and one of the copies gets corrupt? You would expect that everything will continue to work seamlessly. Well.. Almost.. The real answer depends on when the corruption takes place.

 

If the corruption happens while the Oracle Clusterware stack is up and running, then the corruption will be tolerated and the Oracle Clusterware will continue to function without interruptions. Despite the corrupt copy. DBA is advised to repair this hardware/software problem that prevent OCR from accessing the device as soon as possible; alternatively, DBA can replace the failed device with another healthy device using the ocrconfig utility with -replace flag.

 

If however the corruption happens while the Oracle Clusterware stack is down, then it will not be possible to start it up until the failed device becomes online again or some administrative action using ocrconfig utility with -overwrite flag is taken. When the Clusterware attempts to start you will see messages similar to:

 

total id sets  (1), 1st set (1669906634,1958222370), 2nd set (0,0) my votes (1), total votes  (2)

2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprioini:disk 0  (/dev/raw/raw1) doesn't have enough votes (1,2)

2006-07-12 10:53:54.301: [OCRRAW][1210108256]proprseterror: Error in  accessing physical storage [26]

 

This is because the software can't determine which OCR copy is the valid one. In the above example one of the OCR mirrors was lost while the Oracle Clusterware was down. There are 3 ways to fix this failure:

 

a) Fix whatever problem (hardware/software?) that prevent OCR from accessing the device.

 

b) Issue "ocrconfig -overwrite" on any one of the nodes in the cluster. This command will overwrite the vote check built into OCR when it starts up. Basically, if OCR device is configured with mirror, OCR assign each device with one vote. The rule is to have more than 50% of total vote (quorum) in order to safely make sure the available devices contain the latest data. In 2-way mirroring, the total vote count is 2 so it requires 2 votes to achieve the quorum. In the example above there isn't enough vote to start if only one device with one vote is available. (In the earlier example, while OCR is running when the device is down, OCR assign 2 vote to the surviving device and that is why this surviving device now with two votes can start after the cluster is down).

 

c) This method is not recommend to be performed by customers. It is possible to manually modify ocr.loc to delete the failed device and restart the cluster. OCR won't do the vote check if the mirror is not configured.

 

How to move ocr location:

Stop the CRS stack on all nodes - Edit /var/opt/oracle/ocr.loc(or windows registry) on all nodes and set up ocrconfig_loc=new OCR device - Restore from one of the automatic physical backups using ocrconfig -restore. - Run ocrcheck to verify. - reboot to restart the CRS stack.

OCR locations can be changed with ocrconfig:

ocrconfig -replace ocr|ocrmirror [<filename>]

 

In short these are the commands to administer OCR:

ocrconfig -replace ocr destination_file or disk

Here, do the following to add a mirror file.

ocrconfig -replace ocrmirror destination_file or disk

 

To replace OCR do the following:

ocrconfig -replace ocr destination_file or disk

and to replace the OCR mirror:

ocrconfig -replace ocrmirror destination_file or disk

 

Repairing the OCR:

ocrconfig –repair ocrmirror device_name

 

To remove an OCR, you need to have at least one OCR online

ocrconfig -replace ocr OR ocrconfig -replace ocrmirror

 

 

 

Voting Disk Backup and Recovery

On Unix, you can use the following to backup voting disks:

dd if=voting_disk_name of=backup_file_name

 

You can use the ocopy command in Windows environments along with the use the crsctl commands to copy and administer the files. 

 

List existing voting disks:

crsctl query  css votedisk

 

To delete existing voting disk:

       crsctl delete css votedisk path

 

To add another voting disk:

            crsctl add css votedisk path

 

Above command should be run when crs is up, however use force option if crs is down as: crsctl add css votedisk path -force

 

 

Test Case:

Lets apply what we have learned onto the RAC environment we have earlier creates.

 

Adding 3rd voting disk:

Our extended RAC environment already has one voting disk for  RAC1 & RAC3 nodes. I would like to add a third voting disk on RAC5 (3rd) node. From VMWare settings of RAC5 node, create a new pre-allocated virtual disk (IDE) of 300MB in size. Run ASMTOOLG before and after adding the disks, so you would be able to identify the raw partition name. Now you need to run GUIOracleOBJManager.exe under CRS HOME/bin to assign logical name/link the new candidate disk as VOTEDSK3 as shown in the following figure.

 

Now share the D:\RACVIRTUAL\RAC5 folder on XPWS5 to XPWS1 and XPWS2 with full access. On XPWS1 create a logical Y: Drive to point to it, likewise on XPWS2 as well. Then from VMWare add existing virtual disk from both Wok stations (XPWS1 and XPWS2) to point to Y:\votedisk3.vmdk.

You should reboot RAC1, RAC3 and RAC5 and then use GUIOracleOBJManager.exe to see that the new DISK of 300MB is visible.

Now from RAC1 node, make all cluster services must be DOWN and then verify this with the command crs_stat –t. There is a bug that is fixed in 10.2.0.4 that crashes crs stack of voting disks are added online, therefore you need to use the force option:

 

From the node RAC5:

I:\oracle\product\10.2.0\crs\BIN>crsctl add css votedisk \\.\votedsk3

Cluster is not in a ready state for online disk addition

 

I:\oracle\product\10.2.0\crs\BIN>crsctl add css votedisk \\.\votedsk3 -force

Now formatting voting disk: \\.\votedsk3

successful addition of votedisk \\.\votedsk3.

Verify this from all nodes by running crsctl query css votedisk

Now start the cluster node rac1 with all services and it should be up and running.

 

Taking a backup of voting disk:

Shutdown all cluster services across nodes. Use the ocopy oracle supplied command to take a backup as shown below:

From RAC5:

I:\oracle\product\10.2.0\crs\BIN>crsctl query css votedisk

 0.     0    \\.\votedsk1

 1.     0    \\.\votedsk2

 2.     0    \\.\votedsk3

 

located 3 votedisk(s).

 

I:\oracle\product\10.2.0\crs\BIN>ocopy

OCOPY v2.0 - Copyright 1989-1993 Oracle Corp.  All rights reserved.

Usage:

    ocopy from_file [to_file [a | size_1 [size_n]]]

    ocopy -b from_file to_drive

    ocopy -r from_drive to_dir

 

I:\oracle\product\10.2.0\crs\BIN>ocopy \\.\votedsk3 votedsk3.bak

VOTEDSK3.BAK

 

I:\oracle\product\10.2.0\crs\BIN>

Restoring a backup of voting disk:

Suppose you lost your voting disk/device, follow the same procedures as described above to re-create the new voting disk. However suppose you lost all of your voting disk but you had a backup, then follow the procedures to create a new raw voting disk/device as described earlier until the point where you assign the link name. Then run the restore as shown below:

I:\oracle\product\10.2.0\crs\BIN>ocopy votedsk3.bak \\.\votedsk3

\\.\VOTEDSK3

 

Changing Location of Voting disk:

Use the add method described above.

 

What to do when OCR/Voting disks are lost and there is no backup:

Reference Metalink ID: 399482.1

 

Next…

     I though of publishing this document at the moment and I will

     create additional articles on Performance Tuning and Failover

     strategies. For further details please visit Resource Section.

eferences

http://www.oracle.com/technology/products/database/clustering/index.html

 

Copyright © 2007 www.OracleFusions.com All rights reserved. For Educational Purpose Only
The information contained in this document represents my personal view on the issues discussed as of the date of publication, and I can not guarantee the accuracy of any information presented after the date of publication.
This document is for informational purposes only. I MAKE NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT.
For further information, please contact at Support@OracleFusions.com  


Click here to Go Back to Resources Section