Friday, June 08, 2007

Losing One Voting Disk

Voting disks are used in a RAC configuration for maintaining nodes membership. They are critical pieces in a cluster configuration. Starting with ORACLE 10gR2, it is possible to mirror the OCR and the voting disks. Using the default mirroring template, the minimum number of voting disks necessary for a normal functioning is two.

Scenario Setup

In this scenario it is simulated the crash of one voting disk by using the following steps:

  1. identify votings:

crsctl query css votedisk

0. 0 /dev/raw/raw1

1. 0 /dev/raw/raw2

2. 0 /dev/raw/raw3

  1. corrupt one of the voting disks (as root):

    dd if=/dev/zero /dev/raw/raw3 bs=1M

Recoverability Steps

  1. check the “$CRS_HOME/log/[hostname]/alert[hostname].log” file. The following message should be written there which allows us to determine which voting disk became corrupted:

    [cssd(9120)]CRS-1604:CSSD voting file is offline: /opt/oracle/product/10.2.0/crs_1/Voting1. Details in /opt/oracle/product/10.2.0/crs_1/log/aut-arz-ractest1/cssd/ocssd.log.

  2. According to the above listing the Voting1 is the corrupted disk. Shutdown the CRS stack:

    srvctl stop database -d fitstest -o immediate

    srvctl stop asm -n aut-vie-ractest1

    srvctl stop asm -n aut-arz-ractest1

    srvctl stop nodeapps -n aut-vie-ractest1

    srvctl stop nodeapps -n aut-arz-ractest1

    crs_stat -t

    On every node as root:

    crsctl stop crs

  3. Pick a good voting from the remaining ones and copy it over the corrupted one:

    dd if=/dev/raw/raw4 of=/dev/raw/raw3 bs=1M

  4. Start CRS (on every node as root):

      crsctl start crs

  5. Check log file “$CRS_HOME/log/[hostname]/alert[hostname].log”. It should look like shown below:

    [cssd(14463)]CRS-1601:CSSD Reconfiguration complete. Active nodes are aut-vie-ractest1 aut-arz-ractest1 .

    2007-05-31 15:19:53.954

    [crsd(14268)]CRS-1012:The OCR service started on node aut-vie-ractest1.

    2007-05-31 15:19:53.987

    [evmd(14228)]CRS-1401:EVMD started on node aut-vie-ractest1.

    2007-05-31 15:19:55.861 [crsd(14268)]CRS-1201:CRSD started on node aut-vie-ractest1.

  6. After a couple of minutes check the status of the whole CRS stack:

    [oracle@aut-vie-ractest1 ~]$ crs_stat -t

    Name Type Target State Host

    ------------------------------------------------------------

    ora....SM2.asm application ONLINE ONLINE aut-...est1

    ora....T1.lsnr application ONLINE ONLINE aut-...est1

    ora....st1.gsd application ONLINE ONLINE aut-...est1

    ora....st1.ons application ONLINE ONLINE aut-...est1

    ora....st1.vip application ONLINE ONLINE aut-...est1

    ora....SM1.asm application ONLINE ONLINE aut-...est1

    ora....T1.lsnr application ONLINE ONLINE aut-...est1

    ora....st1.gsd application ONLINE ONLINE aut-...est1

    ora....st1.ons application ONLINE ONLINE aut-...est1

    ora....st1.vip application ONLINE ONLINE aut-...est1

    ora....test.db application ONLINE ONLINE aut-...est1

    ora....t1.inst application ONLINE ONLINE aut-...est1

    ora....t2.inst application ONLINE ONLINE aut-...est1


Note: There's also possible to recover a lost voting disk from an old voting backup and to perform the “dd” command without shutting down the CRS stack.

2 Comments:

At 7:02 PM, Blogger Girish said...

Hi,

I have following questions regarding the RAC

1. If voting disk is unavailable does the RAC functions normally or does it fails

2. If OCR is unavailable does the RAC functions normally or does it fails

3. During the crash recovery in RAC does the voting disk and OCR plays any role i.e. is it possible to do recovery if voting disk or OCR is missing

Thanks
Girish

 
At 9:25 AM, Blogger Alexandru Tica said...

Hi,

1. if you have just one voting disk and you loose it the RAC will fail. If you configured redundancy for votings (which means you have 3 voting disks) then you can loose one voting disk and your RAC will still be up & running. If you loose two votings simultaneously then your RAC will stop running.

2. the same is for ocr. if you have just one OCR and you loose it then your RAC will fail. However, if you have redundancy for OCR (which means you have two OCRs) then loosing one OCR is not such a big deal.

3. if you're asking about database recovery, you can recover without votings and ocr. However, these components must be taken into consideration into your main backup strategy. Starting with 11gR2 backup of votings is no longer necessary as they are automatically backed up.

 

Post a Comment

<< Home