This blog has moved here.

Friday, June 08, 2007

Losing One Voting Disk

Voting disks are used in a RAC configuration for maintaining nodes membership. They are critical pieces in a cluster configuration. Starting with ORACLE 10gR2, it is possible to mirror the OCR and the voting disks. Using the default mirroring template, the minimum number of voting disks necessary for a normal functioning is two.

Scenario Setup

In this scenario it is simulated the crash of one voting disk by using the following steps:

  1. identify votings:

crsctl query css votedisk

0. 0 /dev/raw/raw1

1. 0 /dev/raw/raw2

2. 0 /dev/raw/raw3

  1. corrupt one of the voting disks (as root):

    dd if=/dev/zero /dev/raw/raw3 bs=1M

Recoverability Steps

  1. check the “$CRS_HOME/log/[hostname]/alert[hostname].log” file. The following message should be written there which allows us to determine which voting disk became corrupted:

    [cssd(9120)]CRS-1604:CSSD voting file is offline: /opt/oracle/product/10.2.0/crs_1/Voting1. Details in /opt/oracle/product/10.2.0/crs_1/log/aut-arz-ractest1/cssd/ocssd.log.

  2. According to the above listing the Voting1 is the corrupted disk. Shutdown the CRS stack:

    srvctl stop database -d fitstest -o immediate

    srvctl stop asm -n aut-vie-ractest1

    srvctl stop asm -n aut-arz-ractest1

    srvctl stop nodeapps -n aut-vie-ractest1

    srvctl stop nodeapps -n aut-arz-ractest1

    crs_stat -t

    On every node as root:

    crsctl stop crs

  3. Pick a good voting from the remaining ones and copy it over the corrupted one:

    dd if=/dev/raw/raw4 of=/dev/raw/raw3 bs=1M

  4. Start CRS (on every node as root):

      crsctl start crs

  5. Check log file “$CRS_HOME/log/[hostname]/alert[hostname].log”. It should look like shown below:

    [cssd(14463)]CRS-1601:CSSD Reconfiguration complete. Active nodes are aut-vie-ractest1 aut-arz-ractest1 .

    2007-05-31 15:19:53.954

    [crsd(14268)]CRS-1012:The OCR service started on node aut-vie-ractest1.

    2007-05-31 15:19:53.987

    [evmd(14228)]CRS-1401:EVMD started on node aut-vie-ractest1.

    2007-05-31 15:19:55.861 [crsd(14268)]CRS-1201:CRSD started on node aut-vie-ractest1.

  6. After a couple of minutes check the status of the whole CRS stack:

    [oracle@aut-vie-ractest1 ~]$ crs_stat -t

    Name Type Target State Host

    ------------------------------------------------------------

    ora....SM2.asm application ONLINE ONLINE aut-...est1

    ora....T1.lsnr application ONLINE ONLINE aut-...est1

    ora....st1.gsd application ONLINE ONLINE aut-...est1

    ora....st1.ons application ONLINE ONLINE aut-...est1

    ora....st1.vip application ONLINE ONLINE aut-...est1

    ora....SM1.asm application ONLINE ONLINE aut-...est1

    ora....T1.lsnr application ONLINE ONLINE aut-...est1

    ora....st1.gsd application ONLINE ONLINE aut-...est1

    ora....st1.ons application ONLINE ONLINE aut-...est1

    ora....st1.vip application ONLINE ONLINE aut-...est1

    ora....test.db application ONLINE ONLINE aut-...est1

    ora....t1.inst application ONLINE ONLINE aut-...est1

    ora....t2.inst application ONLINE ONLINE aut-...est1


Note: There's also possible to recover a lost voting disk from an old voting backup and to perform the “dd” command without shutting down the CRS stack.

11 comments:

Girish said...

Hi,

I have following questions regarding the RAC

1. If voting disk is unavailable does the RAC functions normally or does it fails

2. If OCR is unavailable does the RAC functions normally or does it fails

3. During the crash recovery in RAC does the voting disk and OCR plays any role i.e. is it possible to do recovery if voting disk or OCR is missing

Thanks
Girish

Alexandru Tică said...

Hi,

1. if you have just one voting disk and you loose it the RAC will fail. If you configured redundancy for votings (which means you have 3 voting disks) then you can loose one voting disk and your RAC will still be up & running. If you loose two votings simultaneously then your RAC will stop running.

2. the same is for ocr. if you have just one OCR and you loose it then your RAC will fail. However, if you have redundancy for OCR (which means you have two OCRs) then loosing one OCR is not such a big deal.

3. if you're asking about database recovery, you can recover without votings and ocr. However, these components must be taken into consideration into your main backup strategy. Starting with 11gR2 backup of votings is no longer necessary as they are automatically backed up.

Anonymous said...

"Using the default mirroring template, the minimum number of voting disks necessary for a normal functioning is two."

Please clarify the above statement...coz the oracle documentation says that you need to have ODD number of Voting disks to avoid split brain syndrome...

Can you describe in detail how instances find whether the other node is down ....

Thanks
Vinod

Alexandru Tică said...

Vinod,

You still have to define 3 voting disks but if one of them fails the RAC will continue to run with the other 2 remaining voting disks.
Regarding the way RAC finds out which nodes are down the mechanism is quite complex. First of all, the clusterware check for node membership using various heartbeats through the interconnect. Likewise, the voting disks are also used to heartbeat the membership nodes as a second way to check. This bypass the interconnect failure and relies to the shared nature of the voting disk. Basically every node operates using a so called "membership bitmap". Every node provides membership information as it is presumed to be correct by that node. This information is written on every 3 seconds by the CKPT process of every instance into the database control file. The mastering instance of the cluster will gather all votes and will decide if there is any "split brain" issue or not. For example, in a 3 nodes RAC the "membership votes" may be: n1 => 101; n2 => 010; n3 => 101. Counting the votes reveals a score of 2 - 1. The second node has a different image of the cluster and will be simply evicted from the rac configuration.

Gas said...

Hi, I have one question:
I was installing a 3 node RAC (10.2.0) I didn't configure redundancy for voting disk. (It's just test environment) but RAC never comes up. At this moment, I'm experiencing the issue in this article, but I havent backups or others voting disks. What can I do?
regards!

Alexandru Tică said...

Hi Gas,

Have a look at metalink Note 399482.1

I didn't tried this but according to the above note: "If there are multiple voting disks and one was accidentally deleted, then check if there are any backups of this voting disk. If there are no backups then we can add one using the crsctl add votedisk command. The complete steps are in the Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide".

Gas said...

I already did follow that note, but when I run root.sh on first node, it fails saying "vote disk is offline", and I check log, and face with a large number of things that I'm not understanding...
It's an option to "erase" the voting disk and re-run root.sh?
What you think?

Anonymous said...

Thank you so much for the blog.
Had a question. I have three cluster File systems which host voting and ocr disks. One of them went offline due to some storage issue, the FS got unmounted.
Cluster continued to run because we had other two file systems which had voting and ocr disks.

The storage admins brought the fs back and when I do query css votedisk it shows online, and the ocrcheck complete sucessfully even logical corruption check also comes out fine.

can I leave this as it is, or shall I have to delete and add voting and ocr?

there are no changes that happened during this offline time of one of the file systems
Appreciate your comment.

Alexandru Tică said...

Hi,

If no corruption errors are reported then it's fine to go on with the actual configuration.

Anonymous said...

Thanks Alex. Will go with not creating OCR and voting disk.

rahul said...

whasts happen if suddenly voting disk goes currept.