First of all, I ran:
[grid@owl bin]$ cluvfy stage -pre nodeadd -n hen Performing pre-checks for node addition Checking node reachability... Node reachability check passed from node "owl" Checking user equivalence... User equivalence check passed for user "grid" Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful Check: Node connectivity for interface "eth0" Node connectivity passed for interface "eth0" Node connectivity check passed Checking CRS integrity... CRS integrity check passed Checking shared resources... Checking CRS home location... The location "/u01/app/11.2.0.2/grid" is not shared but is present/creatable on all nodes Shared resources check for node addition passed Checking node connectivity... Checking hosts config file... Verification of the hosts config file successful Check: Node connectivity for interface "eth0" Node connectivity passed for interface "eth0" Check: Node connectivity for interface "eth1" Node connectivity passed for interface "eth1" Node connectivity check passed Total memory check passed Available memory check passed Swap space check passed Free disk space check passed for "owl:/tmp" Free disk space check passed for "hen:/tmp" Check for multiple users with UID value 1100 passed User existence check passed for "grid" Run level check passed Hard limits check passed for "maximum open file descriptors" Soft limits check passed for "maximum open file descriptors" Hard limits check passed for "maximum user processes" Soft limits check passed for "maximum user processes" System architecture check passed Kernel version check passed Kernel parameter check passed for "semmsl" Kernel parameter check passed for "semmns" Kernel parameter check passed for "semopm" Kernel parameter check passed for "semmni" Kernel parameter check passed for "shmmax" Kernel parameter check passed for "shmmni" Kernel parameter check passed for "shmall" Kernel parameter check passed for "file-max" Kernel parameter check passed for "ip_local_port_range" Kernel parameter check passed for "rmem_default" Kernel parameter check passed for "rmem_max" Kernel parameter check passed for "wmem_default" Kernel parameter check passed for "wmem_max" Kernel parameter check passed for "aio-max-nr" Package existence check passed for "make-3.81( x86_64)" Package existence check passed for "binutils-2.17.50.0.6( x86_64)" Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)" Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)" Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)" Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)" Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)" Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)" Package existence check passed for "glibc-common-2.5( x86_64)" Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)" Package existence check passed for "glibc-headers-2.5( x86_64)" Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)" Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)" Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)" Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)" Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)" Package existence check passed for "sysstat-7.0.2( x86_64)" Package existence check passed for "ksh-20060214( x86_64)" Check for multiple users with UID value 0 passed Current group ID check passed Checking OCR integrity... OCR integrity check passed Checking Oracle Cluster Voting Disk configuration... Oracle Cluster Voting Disk configuration check passed Time zone consistency check passed Starting Clock synchronization checks using Network Time Protocol(NTP)... NTP Configuration file check started... No NTP Daemons or Services were found to be running Clock synchronization check using Network Time Protocol(NTP) passed User "grid" is not part of "root" group. Check passed Checking consistency of file "/etc/resolv.conf" across nodes File "/etc/resolv.conf" does not have both domain and search entries defined domain entry in file "/etc/resolv.conf" is consistent across nodes search entry in file "/etc/resolv.conf" is consistent across nodes All nodes have one search entry defined in file "/etc/resolv.conf" The DNS response time for an unreachable node is within acceptable limit on all nodes File "/etc/resolv.conf" is consistent across nodes Checking GNS integrity... The GNS subdomain name "vmrac.fits.ro" is a valid domain name GNS VIP "poc-gns-vip.vmrac.fits.ro" resolves to a valid IP address PRVF-5229 : GNS VIP is active before Clusterware installation PRVF-5232 : The GNS subdomain qualified host name "hen.vmrac.fits.ro" was resolved into an IP address GNS integrity check failed Pre-check for node addition was unsuccessful on all the nodes.PRVF-5229 is really a strange error: of course the GNS VIP is active because I already have my RAC installed. It really makes sense when installing a new RAC and the GNS vip sould be unallocated but otherwise I don't get it. So, I decided to go on even the CVU was complaining.
The next step would be to run addNode.sh script from [GI_HOME]/oui/bin location. I ran the script and I found that it does nothing if the CVU checks are not passed. You can figure out this if you run the script with debugging:
[grid@owl bin]$ sh -x ./addNode.sh -silent "CLUSTER_NEW_NODES={hen}" + OHOME=/u01/app/11.2.0.2/grid + INVPTRLOC=/u01/app/11.2.0.2/grid/oraInst.loc + ADDNODE='/u01/app/11.2.0.2/grid/oui/bin/runInstaller -addNode -invPtrLoc /u01/app/11.2.0.2/grid/oraInst.loc ORACLE_HOME=/u01/app/11.2.0.2/grid -silent CLUSTER_NEW_NODES={hen}' + '[' '' = Y -o '!' -f /u01/app/11.2.0.2/grid/cv/cvutl/check_nodeadd.pl ']' + CHECK_NODEADD='/u01/app/11.2.0.2/grid/perl/bin/perl /u01/app/11.2.0.2/grid/cv/cvutl/check_nodeadd.pl -pre -silent CLUSTER_NEW_NODES={hen}' + /u01/app/11.2.0.2/grid/perl/bin/perl /u01/app/11.2.0.2/grid/cv/cvutl/check_nodeadd.pl -pre -silent 'CLUSTER_NEW_NODES={hen}' + '[' 1 -eq 0 ']'
As you can see, the check_nodeadd.pl script ends with a non-zero exit code which means error (this perl script is really running the cluvfy utility so, it fails because of the GNS check). The only workaround I found was to ignore this checking using:
export IGNORE_PREADDNODE_CHECKS=Y
After that I was able to successfully run addNode.sh script:
[grid@owl bin]$ ./addNode.sh -silent "CLUSTER_NEW_NODES={hen}" Starting Oracle Universal Installer... ... output truncated ... Saving inventory on nodes (Friday, December 10, 2010 8:49:27 PM EET) . 100% Done. Save inventory complete WARNING:A new inventory has been created on one or more nodes in this session. However, it has not yet been registered as the central inventory of this system. To register the new inventory please run the script at '/u01/app/oraInventory/orainstRoot.sh' with root privileges on nodes 'hen'. If you do not register the inventory, you may not be able to update or patch the products you installed. The following configuration scripts need to be executed as the "root" user in each cluster node. /u01/app/oraInventory/orainstRoot.sh #On nodes hen /u01/app/11.2.0.2/grid/root.sh #On nodes hen To execute the configuration scripts: 1. Open a terminal window 2. Log in as "root" 3. Run the scripts in each cluster node The Cluster Node Addition of /u01/app/11.2.0.2/grid was successful. Please check '/tmp/silentInstall.log' for more details.
Okey, GREAT! Let's run those scripts on the new node:
[root@hen app]# /u01/app/oraInventory/orainstRoot.sh Creating the Oracle inventory pointer file (/etc/oraInst.loc) Changing permissions of /u01/app/oraInventory. Adding read,write permissions for group. Removing read,write,execute permissions for world. Changing groupname of /u01/app/oraInventory to oinstall. The execution of the script is complete. [root@hen app]# /u01/app/11.2.0.2/grid/root.sh Running Oracle 11g root script... The following environment variables are set as: ORACLE_OWNER= grid ORACLE_HOME= /u01/app/11.2.0.2/grid Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The contents of "oraenv" have not changed. No need to overwrite. The contents of "coraenv" have not changed. No need to overwrite. Creating /etc/oratab file... Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed. Using configuration parameter file: /u01/app/11.2.0.2/grid/crs/install/crsconfig_params Creating trace directory PROTL-16: Internal Error Failed to create or upgrade OLR
Failed to create or upgrade OLR at /u01/app/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 6740.
/u01/app/11.2.0.2/grid/perl/bin/perl -I/u01/app/11.2.0.2/grid/perl/lib -I/u01/app/11.2.0.2/grid/crs/install /u01/app/11.2.0.2/grid/crs/install/rootcrs.pl execution failed
Ups! I did not see that coming! First of all, OLR?! Yea, it's like an OCR but local. The only note I found about this error was 1123453.1 and it advises to double check if all install prerequisites are passed using cluvfy. In my case, the only problem I had was with the GNS check. Does GNS have anything to do with my error? As it turned out, no, it doesn't! The big mistake I made (and the cluvfy didn't notice that) was that the SSH setup between nodes was wrong. Connecting from owl to hen was okey, but not vice-versa. After I fixed the SSH configuration the root.sh script was executed without any problems. Great!
The next step was to clone the database oracle home. That was really easy: just run the addNode.sh in the same way I did for GI. So far so good... at this point I was expecting that little magic to happen. Look what the documentation says:
If you store your policy-managed database on Oracle Automatic Storage Management (Oracle ASM), Oracle Managed Files (OMF) is enabled, and if there is space in a server pool for node2, then crsd adds the Oracle RAC instance to node2 and no further action is necessary. If OMF is not enabled, then you must manually add undo and redo logs.
Hey, that's my case! Unfortunately, the new instance didn't show up. Furthermore, the pool configuration was asking for a new node:
[oracle@hen oracle]$ srvctl config srvpool -g poc Server pool name: poc Importance: 10, Min: 2, Max: -1 Candidate server names:Look, I have increased the importance level and I set the "Min" property to 2. Damn it! I don't know why the new server was not automatically picked up, but maybe is also my leak of experience concerning this new server pools concept. In the end I launched "dbca" from the new added node hoping that some new magic options were added. But, no... even the "Instance Management" option was disabled. But, if you are choosing "Configure database" and next, next, next until the SYSDBA credentials are requested then dbca will try to connect to the local instance and it will actually create this new instance. I'm sure this is not the way it was supposed to work but, at least, I could see some results. However, there was another interesting thing. Looking into the alert of the new created instance I found:
Could not open audit file: /u01/app/oracle/admin/poc/adump/poc_2_ora_18197_1.aud Retry Iteration No: 1 OS Error: 2 Retry Iteration No: 2 OS Error: 2 Retry Iteration No: 3 OS Error: 2 Retry Iteration No: 4 OS Error: 2 Retry Iteration No: 5 OS Error: 2 OS Audit file could not be created; failing after 5 retriesI didn't create the /u01/app/oracle/admin/poc/adump folder on my new node and that was causing the error. So, this is another thing I should remember... as part of the addNode.sh cloning process the "adump" location is not automatically created.
And, that's all! Now, my fancy RAC has a new baby node.
No comments:
Post a Comment