This blog has moved here.

Tuesday, April 24, 2012

A simple way to configure a DNS for your Oracle playground RAC

Because we are talking about a test/playground environment, the first idea which comes to mind is virtualization. VirtualBox, Oracle VM, VMWare and, of course, others, all may be used to create a RAC playground, provided that you have a computer powerful enough to support this kind of setup.

But, despite the advantage of virtualization, you also have to think of how the new virtualized RAC will talk to the external world and how you're going to support the new SCAN feature available in Oracle 11g. Yeap, you guessed right! I'm talking about the integration with your DNS server. I will not cover here the GNS feature but just the SCAN address, which must be defined in your DNS to three different IPs, in a round-robin fashion.

Hmmm, damn it! My virtualized RAC is on my home desktop, I have a stupid wireless rooter without any possibility to define DNS entries there, and installing a new machine just for a DNS server seems to be a awfully heavyweight solution. In addition, using BIND, the well know DNS server, is not very appealing for me because I'm not a network administrator and I simply don't get it. But, if you plan to stick with BIND you may find an example here. Of course, being a playground, you may rely to the /etc/hosts file but you'll receive warnings on install and it doesn't feel right, does it?

Ok, introducing MaraDNS. They claim that: "People like MaraDNS because it's small, lightweight, easy to set up, and remarkably secure. It's also cross platform -- the program runs both in Windows and in UNIX clones". Hey, this is what I'm looking for. Great! So, comming back to our environment.

In my case, VirtualBox is installed on a Windows 7 box and the RAC nodes are virtualized with OL6. The idea is that I will have MaraDNS installed on my Windows 7 and it will service my virtualized RAC nodes. The following picture should shed some light.

Now let's configure MaraDNS to act like we want. The Windows version doesn't have any installer. You'll simply extract the archive and you'll have everything you want. Then, we need to edit "mararc" config file. I ended up with:

ipv4_bind_addresses = "127.0.0.1"
timestamp_type = 2
hide_disclaimer = "YES"
random_seed_file = "secret.txt"
csv2 = {}
csv2["altik.ro."] = "db.lan.txt"

I basically say:
  1. listen on localhost for requests (strange, isn't it? Be patient!)
  2. set the timestamp_type, but don't bother
  3. hide the disclaimer when launching the server. I like it quiet
  4. in secret.txt file I have some random text to secure my server, but don't bother too much
  5. everything with a trailing "altik.ro" (the domain) is mapped in the "db.lan.txt" file. Of course, you have to pick your domain. It doesn't need to be a real one, but if it exists it becomes unreachable.
Okey, great! Now let me see the "db.lan.txt" file:

ping.altik.ro.            FQDN4 192.168.1.100
pong.altik.ro.            FQDN4 192.168.1.101
ping-vip.altik.ro.        FQDN4 192.168.1.200
pong-vip.altik.ro.        FQDN4 192.168.1.201
poc-scan.altik.ro.        FQDN4 192.168.1.150
poc-scan.altik.ro.        FQDN4 192.168.1.151
poc-scan.altik.ro.        FQDN4 192.168.1.152

Mmm, that's all? Pretty much! Go ahead, start the server using the provided "bat" file and execute some nslookup commands. For example, "nslookup ping.altik.ro 127.0.0.1". Does it work? Test the SCAN as well. Every nslookup should return another IP (round robin algorithm).

Now, we have two problems:
  1. our DNS server is listening on localhost and there's no way to make requests from a remote node
  2. I'd also like to have internet on my RAC nodes in order to use YUM and other tasks which need Internet
No problem! In MaraDNS folder there is a sub-folder called "Deadwood...", which contain a so-called DNS recursive server. It doesn't allow to define your DNS entries but you can tell it to ask another DNS server for its entries and to cache them. Hey, that's exactly what we want. Ask the DNS server of my ISP for its entries and my local MaraDNS. Let's have a look at the config file (dwood3rc.txt):

root_servers = {}

# My MaraDNS
root_servers["192.in-addr.arpa."] = "127.0.0.1"
root_servers["altik.ro."] = "127.0.0.1"

# DNS server from my ISP
root_servers["."] = "213.154.124.1,"
root_servers["."] += "193.231.252.1"
# The IP this program has
bind_address="192.168.1.10"

# The IPs allowed to connect and use the cache
recursive_acl = "192.168.1.0/24"

# The file containing a hard-to-guess secret
random_seed_file = "secret.txt"

# This is the file Deadwood uses to read the cache to and from disk
cache_file = "dw_cache_bin"
# By default, for security reasons, Deadwood does not allow IPs in the
# 192.168.x.x, 172.[16-31].x.x, 10.x.x.x, 127.x.x.x, 169.254.x.x,
# 224.x.x.x, or 0.0.x.x range.  If using Deadwood to resolve names
# on an internal network, uncomment the following line:
filter_rfc1918 = 0

The above configuration is quite self explanatory. Everyting which refers to my local domain will be served from MaraDNS and everything else will be fetched from a real DNS server provided by my ISP. The "in-adr.arpa" entry is needed in order to support reverse DNS. Pay attention that this DNS server will listen on a real address, the one of my Windows 7 host.

Okay, great! Install Deadwood (as a service), and then test it with nslookup but instead of 127.0.0.1 use 192.168.1.10. If everything is working then it's just a matter of configuring your virtualized nodes to use 192.168.1.10 as a DNS server.

Friday, March 30, 2012

Escape HTML Entities Except for Some Well Known Tags

We are currently developing an APEX application and one of the requirement is to let users enter formatted text data into the system via a CKEDITOR control.


As it may be noticed above, not all the formatting options are enabled but just the possibility to apply bold, italic and underlined styling and, in addition, to create ordered and unordered lists. This means that the data we store into the database has html tags within and it must be echoed back as it is into the browser when the page which renders that data is about to be displayed.

Of course, the first thing you must take into consideration is the security issue: we don't want to render our pages with unsecure tags therefore we must sanitize the input. What we want is to remove all tags which are not into our whitelist. It seems like a very simple problem but the solution is not so obvious.

Our first attempt was to use a smart regular expression to do the job. It turned out that the expression was simply too complicated and very difficult to maintain. Put into the mix the fact that the Oracle regexp engine doesn't have all the goodies you'd expect if you are a Perl or Ruby developer and you quickly end up in disapointment.

The actual solution we implemented (which seems to work pretty well) is to parse the html content and to remove the offending tags. In order to parse we used just an ordinary xmltype. Let's see an example:
select regexp_replace( 
        deletexml(
          xmltype('<post>' ||  
                  'You have <b>been</b> <u>hacked</u> ' ||
                  'by <br/><a href="http://www.youtube.com/watch?v=CjPfBxJboJI">Gurin Remus Adrian</a>.' ||
                  '</post>'),
          '/post/*[(name() != "b") and (name() != "u")]'),
        '(^<post>)|(</post>$)',
        '') 
  from dual;

CLEAN_HTML                            
--------------------------------------
You have <b>been</b><u>hacked</u> by .
Great! Having that it's quite easy to write a PL/SQL function to embed the above logic:
  function strip_html_tags(html_in in clob, except_tags_in in varchar2 := null) return clob as
    l_clean_html clob;
    l_filter varchar2(32767) := '/post';
    l_predicates varchar2(32767);
    l_array apex_application_global.vc_arr2;
  begin
    if except_tags_in is null then
      l_filter := l_filter || '/*[(name() != "post")]';
    else
      l_array := apex_util.string_to_table(except_tags_in, ',');
      for i in 1..l_array.count loop
        if l_array(i) is not null then
          l_predicates := l_predicates || case when i > 1 then ' and ' else '' end || '(name() != "' || l_array(i) || '")';
        end if;
      end loop;
      if l_predicates is not null then
       l_filter := l_filter || '/*[' || l_predicates || ']';
      end if;
    end if;
    select regexp_replace( 
              deletexml(
                xmltype('<post>' || html_in || '</post>'),
                l_filter),
              '(^<post>)|(</post>$)',
           '') into l_clean_html
      from dual;
    return l_clean_html;
  end;
The only problem we had using this approach was with German characters or other HTML character entities. For example, by default CKEDIT sends the German umlaut character as &uuml; and, apparently, the Oracle XML parser is not smart enough to recognize this notation. It simply fails with LPX-00118 error. In fact, if you think about it, it's not so unusual given that we are talking about a XML parser, not an HTML one. However, if you encode umlaut with its numerical notation which is &#252; then everything is working like a charm. This boils down to properly configure CKEDITOR in order to force it to encode this characters with their numerical notation. The key property is:
entities_processNumerical": 'force'
Mission accomplished!

Wednesday, January 26, 2011

Light HTML Oracle Support Site

The standard Flash GUI of the actual Oracle Support site is driving me crazy. For a guy like me, used to work especially with console applications, addicted to VIM and browsing using Vimperator, having this Flash site in front of me really hurts. It hurts when I have to move my hand to click something with the mouse, it hurts when I cannot go back to the previous page, it hurts... it hurts... it hurts! Oracle Support is not a game, it doesn't need animation, video or other such things Flash is good at.

Okey, enough with all these complaints! If you don't like Flash, or your browser simply doesn't support it you may switch to the plain html interface: https://supporthtml.oracle.com. Isn't it cool? Just have a look:


Now I can use my beloved Vimperator and, of course, Flashblock.

However, I have to point out the limitations.The HTML option does not include the following functionality, which is only available in the Flash version of My Oracle Support:

* Systems
* Projects
* Healthchecks
* Patch Advice & Recommendations
* Inventory Reporting
* OnDemand Portal, Service Request and RFC Functionality
* CRM OnDemand Service Requests & Knowledge

As far as I'm concerned, I didn't use those features anyway therefore I'm good!

Monday, January 24, 2011

crs_stat pretty print on RAC 11.2

I really don't like the way crs_stat -t displays RAC status information on my 11.2.0.2 database. For example, one of my instances are down. Can you figure out this looking at the output below?

[grid@owl ~]$ crs_stat -t
Name           Type           Target    State     Host        
------------------------------------------------------------
ora.DATA.dg    ora....up.type ONLINE    ONLINE    hen         
ora....ER.lsnr ora....er.type ONLINE    ONLINE    hen         
ora....N1.lsnr ora....er.type ONLINE    ONLINE    hen         
ora....N2.lsnr ora....er.type ONLINE    ONLINE    hen         
ora....N3.lsnr ora....er.type ONLINE    ONLINE    owl         
ora.asm        ora.asm.type   ONLINE    ONLINE    hen         
ora.cvu        ora.cvu.type   ONLINE    ONLINE    hen         
ora.gns        ora.gns.type   ONLINE    ONLINE    hen         
ora.gns.vip    ora....ip.type ONLINE    ONLINE    hen         
ora.gsd        ora.gsd.type   OFFLINE   OFFLINE               
ora....SM2.asm application    ONLINE    ONLINE    hen         
ora....EN.lsnr application    ONLINE    ONLINE    hen         
ora.hen.gsd    application    OFFLINE   OFFLINE               
ora.hen.ons    application    ONLINE    ONLINE    hen         
ora.hen.vip    ora....t1.type ONLINE    ONLINE    hen         
ora....network ora....rk.type ONLINE    ONLINE    hen         
ora.oc4j       ora.oc4j.type  ONLINE    ONLINE    hen         
ora.ons        ora.ons.type   ONLINE    ONLINE    hen         
ora....SM1.asm application    ONLINE    ONLINE    owl         
ora....WL.lsnr application    ONLINE    ONLINE    owl         
ora.owl.gsd    application    OFFLINE   OFFLINE               
ora.owl.ons    application    ONLINE    ONLINE    owl         
ora.owl.vip    ora....t1.type ONLINE    ONLINE    owl         
ora.poc.db     ora....se.type ONLINE    ONLINE    hen         
ora....uci.svc ora....ce.type ONLINE    ONLINE    hen         
ora....ry.acfs ora....fs.type ONLINE    ONLINE    hen         
ora.scan1.vip  ora....ip.type ONLINE    ONLINE    hen         
ora.scan2.vip  ora....ip.type ONLINE    ONLINE    hen         
ora.scan3.vip  ora....ip.type ONLINE    ONLINE    owl         
The resource "ora.poc.db" is ONLINE therefore no red flags you might say. Well, bad luck: the database is up & running but with only one instance. The other one is dead. The database is policy managed but I want to be aware if all instances from the pool are running. How can we figure this out? Not a very big deal, just issue: "crsctl status resource". You'll get something like this:

NAME=ora.poc.db
TYPE=ora.database.type
TARGET=OFFLINE, ONLINE
STATE=OFFLINE, ONLINE on hen

Looking at TARGET and STATE is clear that something is OFFLINE and the red flag pops up right away. Of course, "crsctl status resource" command has the same problem as crs_stat when it comes to pretty print the result. It also has a tabular format (see the -t switch) but it'a a little bit verbose as it also displays the local resources status. But hey, do you remember Note 259301.1? It was about an awk script used for parsing the crs_stat output and displaying it in a nicer way. Okey, let's take that script and change it to take the output of the "crsctl status resource" command. I'm not an awk expert, but the following script is working pretty well:

#!/usr/bin/ksh
#
# Sample 10g CRS resource status query script
#
# Description:
#    - Returns formatted version of crs_stat -t, in tabular
#      format, with the complete rsc names and filtering keywords
#   - The argument, $RSC_KEY, is optional and if passed to the script, will
#     limit the output to HA resources whose names match $RSC_KEY.
# Requirements:
#   - $ORA_CRS_HOME should be set in your environment 

RSC_KEY=$1
QSTAT=-u
AWK=/bin/awk    # if not available use /usr/bin/awk

# Table header:echo ""
$AWK \
  'BEGIN {printf "%-45s %-10s %-18s\n", "HA Resource", "Target", "State";
          printf "%-45s %-10s %-18s\n", "-----------", "------", "-----";}'

# Table body:
$ORACLE_HOME/bin/crsctl status resource | $AWK \
 '
function ltrim(s) { sub(/^[ \t]+/, "", s); return s }
function rtrim(s) { sub(/[ \t]+$/, "", s); return s }
function trim(s)  { return rtrim(ltrim(s)); }

  BEGIN { FS="="; state = 0; }
  $1~/NAME/ && $2~/'$RSC_KEY'/ {appname = $2; state=1};
  state == 0 {next;}
  $1~/TARGET/ && state == 1 {apptarget = $2; split(apptarget, atarget, ","); state=2;}
  $1~/STATE/ && state == 2 {appstate = $2; split(appstate, astate, ","); state=3;}
  state == 3 { split(appname, a, ","); 
               for (i = 1; i <= length(atarget); i++) { 
                 printf "%-45s %-10s %-18s\n", appname, trim(atarget[i]), trim(astate[i]) 
               }; 
               state=0;}'
And the output is:
[grid@owl ~]$ cs
HA Resource                                   Target     State             
-----------                                   ------     -----             
ora.DATA.dg                                   ONLINE     ONLINE on hen     
ora.DATA.dg                                   ONLINE     ONLINE on owl     
ora.LISTENER.lsnr                             ONLINE     ONLINE on hen     
ora.LISTENER.lsnr                             ONLINE     ONLINE on owl     
ora.LISTENER_SCAN1.lsnr                       ONLINE     ONLINE on hen     
ora.LISTENER_SCAN2.lsnr                       ONLINE     ONLINE on hen     
ora.LISTENER_SCAN3.lsnr                       ONLINE     ONLINE on owl     
ora.asm                                       ONLINE     ONLINE on hen     
ora.asm                                       ONLINE     ONLINE on owl     
ora.cvu                                       ONLINE     ONLINE on hen     
ora.gns                                       ONLINE     ONLINE on hen     
ora.gns.vip                                   ONLINE     ONLINE on hen     
ora.gsd                                       OFFLINE    OFFLINE           
ora.gsd                                       OFFLINE    OFFLINE           
ora.hen.vip                                   ONLINE     ONLINE on hen     
ora.net1.network                              ONLINE     ONLINE on hen     
ora.net1.network                              ONLINE     ONLINE on owl     
ora.oc4j                                      ONLINE     ONLINE on hen     
ora.ons                                       ONLINE     ONLINE on hen     
ora.ons                                       ONLINE     ONLINE on owl     
ora.owl.vip                                   ONLINE     ONLINE on owl     
ora.poc.db                                    OFFLINE    OFFLINE           
ora.poc.db                                    ONLINE     ONLINE on hen     
ora.poc.muci.svc                              ONLINE     ONLINE on hen     
ora.poc.muci.svc                              ONLINE     OFFLINE           
ora.registry.acfs                             ONLINE     ONLINE on hen     
ora.registry.acfs                             ONLINE     ONLINE on owl     
ora.scan1.vip                                 ONLINE     ONLINE on hen     
ora.scan2.vip                                 ONLINE     ONLINE on hen     
ora.scan3.vip                                 ONLINE     ONLINE on owl     

Looking at the above output I can clearly see the partial OFFLINE status of my database. From my point of view, is much better.

Tuesday, December 14, 2010

Kill a Session From Any Node

I really like this new 11g feature which allows the DBA to kill a session despite his session is on a different instance than the instance where the session to be killed resides. The ALTER SYSTEM KILL SESSION statement has been improved and allows specifying the instance number where the session you want to kill is located:

ALTER SYSTEM KILL SESSION 'sid, serial#, @inst_no';

Great!

Saturday, December 11, 2010

Extending my RAC with a new node

I have a 11.2.0.2 database comprised of one node. I especially created it with one node just to have the chance to add another node later. Why? Because I wanted to play with this new GPnP feature. So, despite my RAC was comprised of one node, it was actually a fully functional environment, with GNS, IPMI, CTSAS and a policy managed database. Okey, the process should be straightforward: run some CVU checks to see if the node to be added is ready and then run addNode.sh script from the GI home of the existing RAC node. In my case, the existing node was named "owl" and the node to be added was "hen".

First of all, I ran:
[grid@owl bin]$ cluvfy stage -pre nodeadd -n hen

Performing pre-checks for node addition

Checking node reachability...
Node reachability check passed from node "owl"


Checking user equivalence...
User equivalence check passed for user "grid"

Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Node connectivity check passed


Checking CRS integrity...

CRS integrity check passed

Checking shared resources...

Checking CRS home location...
The location "/u01/app/11.2.0.2/grid" is not shared but is present/creatable on all nodes
Shared resources check for node addition passed


Checking node connectivity...

Checking hosts config file...

Verification of the hosts config file successful

Check: Node connectivity for interface "eth0"
Node connectivity passed for interface "eth0"

Check: Node connectivity for interface "eth1"
Node connectivity passed for interface "eth1"

Node connectivity check passed

Total memory check passed
Available memory check passed
Swap space check passed
Free disk space check passed for "owl:/tmp"
Free disk space check passed for "hen:/tmp"
Check for multiple users with UID value 1100 passed
User existence check passed for "grid"
Run level check passed
Hard limits check passed for "maximum open file descriptors"
Soft limits check passed for "maximum open file descriptors"
Hard limits check passed for "maximum user processes"
Soft limits check passed for "maximum user processes"
System architecture check passed
Kernel version check passed
Kernel parameter check passed for "semmsl"
Kernel parameter check passed for "semmns"
Kernel parameter check passed for "semopm"
Kernel parameter check passed for "semmni"
Kernel parameter check passed for "shmmax"
Kernel parameter check passed for "shmmni"
Kernel parameter check passed for "shmall"
Kernel parameter check passed for "file-max"
Kernel parameter check passed for "ip_local_port_range"
Kernel parameter check passed for "rmem_default"
Kernel parameter check passed for "rmem_max"
Kernel parameter check passed for "wmem_default"
Kernel parameter check passed for "wmem_max"
Kernel parameter check passed for "aio-max-nr"
Package existence check passed for "make-3.81( x86_64)"
Package existence check passed for "binutils-2.17.50.0.6( x86_64)"
Package existence check passed for "gcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "glibc-2.5-24 (x86_64)( x86_64)"
Package existence check passed for "compat-libstdc++-33-3.2.3 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-0.125 (x86_64)( x86_64)"
Package existence check passed for "elfutils-libelf-devel-0.125( x86_64)"
Package existence check passed for "glibc-common-2.5( x86_64)"
Package existence check passed for "glibc-devel-2.5 (x86_64)( x86_64)"
Package existence check passed for "glibc-headers-2.5( x86_64)"
Package existence check passed for "gcc-c++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libaio-devel-0.3.106 (x86_64)( x86_64)"
Package existence check passed for "libgcc-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "libstdc++-devel-4.1.2 (x86_64)( x86_64)"
Package existence check passed for "sysstat-7.0.2( x86_64)"
Package existence check passed for "ksh-20060214( x86_64)"
Check for multiple users with UID value 0 passed
Current group ID check passed

Checking OCR integrity...

OCR integrity check passed

Checking Oracle Cluster Voting Disk configuration...

Oracle Cluster Voting Disk configuration check passed
Time zone consistency check passed

Starting Clock synchronization checks using Network Time Protocol(NTP)...

NTP Configuration file check started...
No NTP Daemons or Services were found to be running

Clock synchronization check using Network Time Protocol(NTP) passed


User "grid" is not part of "root" group. Check passed
Checking consistency of file "/etc/resolv.conf" across nodes

File "/etc/resolv.conf" does not have both domain and search entries defined
domain entry in file "/etc/resolv.conf" is consistent across nodes
search entry in file "/etc/resolv.conf" is consistent across nodes
All nodes have one search entry defined in file "/etc/resolv.conf"
The DNS response time for an unreachable node is within acceptable limit on all nodes

File "/etc/resolv.conf" is consistent across nodes


Checking GNS integrity...
The GNS subdomain name "vmrac.fits.ro" is a valid domain name
GNS VIP "poc-gns-vip.vmrac.fits.ro" resolves to a valid IP address
PRVF-5229 : GNS VIP is active before Clusterware installation

PRVF-5232 : The GNS subdomain qualified host name "hen.vmrac.fits.ro" was resolved into an IP address

GNS integrity check failed
Pre-check for node addition was unsuccessful on all the nodes.
PRVF-5229 is really a strange error: of course the GNS VIP is active because I already have my RAC installed. It really makes sense when installing a new RAC and the GNS vip sould be unallocated but otherwise I don't get it. So, I decided to go on even the CVU was complaining.

The next step would be to run addNode.sh script from [GI_HOME]/oui/bin location. I ran the script and I found that it does nothing if the CVU checks are not passed. You can figure out this if you run the script with debugging:

[grid@owl bin]$ sh -x ./addNode.sh -silent "CLUSTER_NEW_NODES={hen}"
+ OHOME=/u01/app/11.2.0.2/grid
+ INVPTRLOC=/u01/app/11.2.0.2/grid/oraInst.loc
+ ADDNODE='/u01/app/11.2.0.2/grid/oui/bin/runInstaller -addNode -invPtrLoc /u01/app/11.2.0.2/grid/oraInst.loc ORACLE_HOME=/u01/app/11.2.0.2/grid -silent CLUSTER_NEW_NODES={hen}'
+ '[' '' = Y -o '!' -f /u01/app/11.2.0.2/grid/cv/cvutl/check_nodeadd.pl ']'
+ CHECK_NODEADD='/u01/app/11.2.0.2/grid/perl/bin/perl /u01/app/11.2.0.2/grid/cv/cvutl/check_nodeadd.pl -pre -silent CLUSTER_NEW_NODES={hen}'
+ /u01/app/11.2.0.2/grid/perl/bin/perl /u01/app/11.2.0.2/grid/cv/cvutl/check_nodeadd.pl -pre -silent 'CLUSTER_NEW_NODES={hen}'
+ '[' 1 -eq 0 ']'

As you can see, the check_nodeadd.pl script ends with a non-zero exit code which means error (this perl script is really running the cluvfy utility so, it fails because of the GNS check). The only workaround I found was to ignore this checking using: export IGNORE_PREADDNODE_CHECKS=Y
After that I was able to successfully run addNode.sh script:

[grid@owl bin]$ ./addNode.sh -silent "CLUSTER_NEW_NODES={hen}"
Starting Oracle Universal Installer...

... output truncated ...

Saving inventory on nodes (Friday, December 10, 2010 8:49:27 PM EET)
.                                                               100% Done.
Save inventory complete
WARNING:A new inventory has been created on one or more nodes in this session. However, it has not yet been registered as the central inventory of this system.
To register the new inventory please run the script at '/u01/app/oraInventory/orainstRoot.sh' with root privileges on nodes 'hen'.
If you do not register the inventory, you may not be able to update or patch the products you installed.
The following configuration scripts need to be executed as the "root" user in each cluster node.
/u01/app/oraInventory/orainstRoot.sh #On nodes hen
/u01/app/11.2.0.2/grid/root.sh #On nodes hen
To execute the configuration scripts:
    1. Open a terminal window
    2. Log in as "root"
    3. Run the scripts in each cluster node

The Cluster Node Addition of /u01/app/11.2.0.2/grid was successful.
Please check '/tmp/silentInstall.log' for more details.

Okey, GREAT! Let's run those scripts on the new node:
[root@hen app]# /u01/app/oraInventory/orainstRoot.sh
Creating the Oracle inventory pointer file (/etc/oraInst.loc)
Changing permissions of /u01/app/oraInventory.
Adding read,write permissions for group.
Removing read,write,execute permissions for world.

Changing groupname of /u01/app/oraInventory to oinstall.
The execution of the script is complete.

[root@hen app]# /u01/app/11.2.0.2/grid/root.sh
Running Oracle 11g root script...

The following environment variables are set as:
    ORACLE_OWNER= grid
    ORACLE_HOME=  /u01/app/11.2.0.2/grid

Enter the full pathname of the local bin directory: [/usr/local/bin]: 
The contents of "dbhome" have not changed. No need to overwrite.
The contents of "oraenv" have not changed. No need to overwrite.
The contents of "coraenv" have not changed. No need to overwrite.


Creating /etc/oratab file...
Entries will be added to the /etc/oratab file as needed by
Database Configuration Assistant when a database is created
Finished running generic part of root script.
Now product-specific root actions will be performed.
Using configuration parameter file: /u01/app/11.2.0.2/grid/crs/install/crsconfig_params
Creating trace directory
PROTL-16: Internal Error
Failed to create or upgrade OLR
 Failed to create or upgrade OLR at /u01/app/11.2.0.2/grid/crs/install/crsconfig_lib.pm line 6740.
/u01/app/11.2.0.2/grid/perl/bin/perl -I/u01/app/11.2.0.2/grid/perl/lib -I/u01/app/11.2.0.2/grid/crs/install /u01/app/11.2.0.2/grid/crs/install/rootcrs.pl execution failed

Ups! I did not see that coming! First of all, OLR?! Yea, it's like an OCR but local. The only note I found about this error was 1123453.1 and it advises to double check if all install prerequisites are passed using cluvfy. In my case, the only problem I had was with the GNS check. Does GNS have anything to do with my error? As it turned out, no, it doesn't! The big mistake I made (and the cluvfy didn't notice that) was that the SSH setup between nodes was wrong. Connecting from owl to hen was okey, but not vice-versa. After I fixed the SSH configuration the root.sh script was executed without any problems. Great!

The next step was to clone the database oracle home. That was really easy: just run the addNode.sh in the same way I did for GI. So far so good... at this point I was expecting that little magic to happen. Look what the documentation says:

If you store your policy-managed database on Oracle Automatic Storage Management (Oracle ASM), Oracle Managed Files (OMF) is enabled, and if there is space in a server pool for node2, then crsd adds the Oracle RAC instance to node2 and no further action is necessary. If OMF is not enabled, then you must manually add undo and redo logs.

Hey, that's my case! Unfortunately, the new instance didn't show up. Furthermore, the pool configuration was asking for a new node:
[oracle@hen oracle]$ srvctl config srvpool -g poc
Server pool name: poc
Importance: 10, Min: 2, Max: -1
Candidate server names: 
Look, I have increased the importance level and I set the "Min" property to 2. Damn it! I don't know why the new server was not automatically picked up, but maybe is also my leak of experience concerning this new server pools concept. In the end I launched "dbca" from the new added node hoping that some new magic options were added. But, no... even the "Instance Management" option was disabled. But, if you are choosing "Configure database" and next, next, next until the SYSDBA credentials are requested then dbca will try to connect to the local instance and it will actually create this new instance. I'm sure this is not the way it was supposed to work but, at least, I could see some results. However, there was another interesting thing. Looking into the alert of the new created instance I found:
Could not open audit file: /u01/app/oracle/admin/poc/adump/poc_2_ora_18197_1.aud
Retry Iteration No: 1   OS Error: 2
Retry Iteration No: 2   OS Error: 2
Retry Iteration No: 3   OS Error: 2
Retry Iteration No: 4   OS Error: 2
Retry Iteration No: 5   OS Error: 2
OS Audit file could not be created; failing after 5 retries
I didn't create the /u01/app/oracle/admin/poc/adump folder on my new node and that was causing the error. So, this is another thing I should remember... as part of the addNode.sh cloning process the "adump" location is not automatically created.
And, that's all! Now, my fancy RAC has a new baby node.

Wednesday, December 08, 2010

Upgrade GI to 11.2.0.2: Simply Surprising...

I never thought I'd write a post about such a trivial task... Well, if you are going to upgrade from 11.2.0.1 to 11.2.0.2 be prepared for surprises.

The first surprise is given by the download page from the oracle support site (formally know as Oracle Metalink). The 11.2.0.2 patch set has 4.8G! WTF?! Furthermore, it is split in 7 pieces... Despite of this huge size, the good thing is that, unlike the previous releases, this patch-set may be used as a self-contained Oracle installer, which means you don't have to install a base 11.2.0.1 release and, after that, to apply the 11.2.0.2 patch-set on top of it, but you may simply install the 11.2.0.2 release directly. There's one more catch: if you want to upgrade just the Grid Infrastructure you don't need all 7 pieces from the patch-set. On the download page is not very clear mentioned but if you have the curiosity to open the README (and you should!) then you'll find the following:

Great! So, for the beginning we'd need the 3rd piece in order to upgrade our Grid Infrastructure.

The second surprise is the fact that the GI cannot be in-place upgraded. In previous releases we used to patch providing an existing home location. Starting with 11.2.0.2 in-place upgrades for GI are not supported. According to the "Upgrade" guide:

As of Oracle Database 11g release 2 (11.2), the Oracle Clusterware software must be upgraded to a new home location in the Oracle grid infrastructure home. Additionally, Oracle ASM and Oracle Clusterware (and Oracle Restart for single-instance databases) must run in the same Oracle grid infrastructure home. When upgrading Oracle Clusterware to release 11.2, OUI automatically calls Oracle ASM Cluster Assistant (ASMCA) to perform the upgrade into the grid infrastructure home.

Okey, good to know! Let's start the upgrade process of GI. The wizard provided by the OUI is quite intuitive therefore I will not bother you with screenshots and other obvious things. However, the next surprise comes when you are running the rootupgrade.sh script. The error is:
Failed to add (property/value):('OLD_OCR_ID/'-1') for checkpoint:ROOTCRS_OLDHOMEINFO.Error code is 256
 The fixes for bug 9413827 are not present in the 11.2.0.1 crs home
 Apply the patches for these bugs in the 11.2.0.1 crs home and then run 
 rootupgrade.sh /oragi/perl/bin/perl -I/oragi/perl/lib -I/oragi/crs/install /oragi/crs/install/rootcrs.pl execution failed
WTF? You cannot patch if you don't have another stupid patch already there. Okey, as an Oracle DBA you have to be a patient guy... take a deep breath and start looking for bug 9413827. First of all there is the 10036834.8 note, which basically says that you might still get this error even if you apply the patch for the 9413827 bug. As an workaround they suggest to also apply the patch for 9655006 bug. That's madness! In the end it turns out that 9655006 patch is actually the July 10.2.0.1.2 PSU. Okey, just download the appropriate version for your platform. Now, another surprise... you need an updated version of OPatch utility. Damn it! Back to metalink, search for patch 6880880 and download the 11.2.0.0.0 version for your platform (Take care not to download the wrong version. By the way, did you noticed that you may download a wget script which can be used to download the patch without using a browser? Yea, finally something good on that shitty flash GUI.) According to the README they suggest to unzip the upgraded OPatch utility directly into your CRS home, using something like:
unzip [p6880880...zip] -d [your GI home]
... which I did!
Now, you have to unzip the PSU patch into an empty folder, let's say /u01/stage, and run the following command as root:
/OPatch/opatch auto /u01/stage/ -och [your GI home]
In my case, the output was:
Executing /usr/bin/perl /u01/app/11.2.0.1/grid/OPatch/crs/patch112.pl -patchdir /u01 -patchn stage -och /u01/app/11.2.0.1/grid/ -paramfile /u01/app/11.2.0.1/grid/crs/install/crsconfig_params
2010-12-08 12:32:19: Parsing the host name
2010-12-08 12:32:19: Checking for super user privileges
2010-12-08 12:32:19: User has super user privileges
Using configuration parameter file: /u01/app/11.2.0.1/grid/crs/install/crsconfig_params
The opatch Component check failed. This patch is not applicable for /u01/app/11.2.0.1/grid/
The opatch Component check failed. This patch is not applicable for /u01/app/11.2.0.1/grid/
Patch Component/Conflict  check failed for /u01/app/11.2.0.1/grid/
Upssy! Another surprise! This patch is not applicable for bla bla bla? Are you serious? Let's check the logs. They should be in your $CRS_HOME/cfgtoollogs. Search for a log file named as opatchauto[timestamp].log. The important part for the log:
2010-12-08 12:32:19: The component check failed with following error
2010-12-08 12:32:19: bash: /u01/app/11.2.0.1/grid/OPatch/opatch: Permission denied
Ha? I'm root! Aaaa... okey! Apparently it tries to run the OPatch tool under the grid user. Okey, let's fix the permissions.
chown root:oinstall /u01/app/11.2.0.1/grid/OPatch -R
chmod g+r /u01/app/11.2.0.1/grid/OPatch/opatch
Now, try again! Yeap... now it's working.
After applying the patch we are ready for our rootupgrade.sh. It's interesting that the output still contains the Failed to add (property/value):('OLD_OCR_ID/'-1') message but the upgrade continues without any other complaints. Okey, let's perform a quick check:
srvctl config asm
ASM home: /u01/app/11.2.0.2/grid
ASM listener: LISTENER

srvctl config listener -a
Name: LISTENER
Network: 1, Owner: grid
Home: 
  /u01/app/11.2.0.2/grid on node(s) owl
End points: TCP:1521
Great, the ASM and listeners are relocated to the new GI home. The next logical thing to do is to uninstall the old GI home, right? It's as simple as:
/deinstall/deinstall
Oookey, meet SURPRISE Number 6:
ERROR: You must delete or downgrade the Oracle RAC databases and de-install the Oracle RAC homes before attempting to remove the Oracle Clusterware homes.
Isn't it great? On metalink I found Bug 10332736 and, on the WORKAROUND section, it says something about writing a note with a manual uninstall procedure. However, at the time of writing this, the note wasn't published yet. Yea... all I can say is that I'm tired of these stupid issues. What happend with the Oracle testing department? They encourage to patch frequently but, as far as I'm concerned, I always have this creepy feeling before doing it.