This blog has moved here.

Showing posts with label RMAN. Show all posts
Showing posts with label RMAN. Show all posts

Sunday, May 02, 2010

Autobackup CF with Flash Recovery Area

In our office we have a 10g RAC database. It has a flash recovery area enabled, which points to an ASM disk. Nothing special I would say... However, from time to time, our nightly backup script simply fails complaining that it can't find some obsolete backups which should be deleted:

RMAN-06207: WARNING: 4 objects could not be deleted for DISK channel(s) due
RMAN-06208: to mismatched status. Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212: Object Type Filename/Handle
RMAN-06213: --------------- ---------------------------------------------------
RMAN-06214: Backup Piece /u01/app/oracle/product/10.2.0/db_1/dbs/c-24173594-20100427-00
RMAN-06214: Backup Piece /u01/app/oracle/product/10.2.0/db_1/dbs/c-24173594-20100427-01
RMAN-06214: Backup Piece /u01/app/oracle/product/10.2.0/db_1/dbs/c-24173594-20100428-00
RMAN-06214: Backup Piece /u01/app/oracle/product/10.2.0/db_1/dbs/c-24173594-20100428-01

That's weird! All those backup pieces are controlfile autobackups. RMAN looks for them into a local filesystem and, being a RAC database, those files are accessible, obvious, just from one node. But how? They were supposed to be placed into our shared storage, in FRA, to be more precise. Well, let's look once again to our settings:

SQL> show parameter recov

NAME TYPE VALUE
------------------------------------ ----------- ------------------------------
db_recovery_file_dest string +DG1
db_recovery_file_dest_size big integer 150000M
recovery_parallelism integer 0

Okey, it's clear we have a FRA! What about RMAN settings?

RMAN> show all;

using target database control file instead of recovery catalog
RMAN configuration parameters are:
CONFIGURE RETENTION POLICY TO RECOVERY WINDOW OF 2 DAYS;
CONFIGURE BACKUP OPTIMIZATION OFF; # default
CONFIGURE DEFAULT DEVICE TYPE TO DISK; # default
CONFIGURE CONTROLFILE AUTOBACKUP ON;
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F';
CONFIGURE DEVICE TYPE DISK PARALLELISM 4 BACKUP TYPE TO COMPRESSED BACKUPSET;
CONFIGURE DATAFILE BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE ARCHIVELOG BACKUP COPIES FOR DEVICE TYPE DISK TO 1; # default
CONFIGURE MAXSETSIZE TO UNLIMITED; # default
CONFIGURE ENCRYPTION FOR DATABASE OFF; # default
CONFIGURE ENCRYPTION ALGORITHM 'AES128'; # default
CONFIGURE ARCHIVELOG DELETION POLICY TO NONE; # default
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/10.2.0/db_1/dbs/snapcf_fd1.f'; # default

It looks good... the autobackup format for controlfile is '%F' which is the default one, right? The documentation proves that:

The default location for the autobackup on disk is the flash recovery area (if configured) or a platform-specific location (if not configured). RMAN automatically backs up the current control file using the default format of %F.

Okey, we have a flash recovery area and a %F default autobackup format... WTF? Well, the answer is given by the 338483.1 metalink note. Apparently, there is a big difference between having the autobackup format set on its default value and having it reset to its default... Interesting, ha? It is... So, if you set (explicitly) the autobackup format to %F, the autobackup file will go to a OS specific location, which on Linux is $?/dbs. But if you have the autobackup format on its default (explicitly reset it, or never set it at all) and you have a FRA configured then that autobackup file will actually go to FRA.
So, in my case the solution was simple (please notice the "# default" marker):

RMAN> CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK CLEAR;

old RMAN configuration parameters:
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F';
RMAN configuration parameters are successfully reset to default value

RMAN> show controlfile autobackup format;

RMAN configuration parameters are:
CONFIGURE CONTROLFILE AUTOBACKUP FORMAT FOR DEVICE TYPE DISK TO '%F'; # default

Ooookey, really really unintuitive... I think the Oracle documentation should be more precise regarding this.

Friday, March 19, 2010

When having a rman retention policy based on REDUNDANCY is a bad idea...

Suppose you have a RMAN retention policy of "REDUNDANCY 2". This means that as long as you have at least two backups of the same datafile, controlfile/spfile or archivelog the other older backups become obsolete and RMAN is allowed to safely remove them.

Now, let's also suppose that every night you backup your database using the following script:
CONFIGURE CONTROLFILE AUTOBACKUP ON;
rman {
backup database plus archivelog;
delete noprompt obsolete redundancy 2;
}

The backup task is quite simple: first of all it ensures that we have the controlfile autobackup feature on, then it backups the database and archives and, at the end, it deletes all obsolete backups using the REDUNDANCY 2 retention policy.
Using the above approach you might think that you can restore your database as it was two days ago, right? For example, if you have a backup taken on Monday and another one taken on Tuesday you may restore your database as it was within the (Monday_last_backup - Today) time interval. Well, that's wrong!

Consider the following scenario:
1. On Monday night you backup the database using the above script;
2. On Tuesday, during the day, you drop a tablespace. Because this is a structural database change a controlfile autobackup will be triggered. Ieeei, you have a new controlfile backup.
3. On Tuesday night you backup again the database... nothing unusual, right?

Well, the tricky part is regarding the DELETE OBSOLETE command. When the backup script will run this command, RMAN finds out three controlfile backups: one is originating from the Monday backup, one is from the structural change and the third is from our just finished Tuesday backup database command. Now according to the retention policy of "REDUNDANCY 2", RMAN will assume that it is safe to delete the backup of the controlfile taken on Monday night backup because it's out of our retention policy and because this backup is the oldest one. Uuups... this means that we gonna have a big problem restoring the database as it was before our structural change because we don't have a controlfile backup from that time.

So, if you intend to incomplete recover your database to a previous time in the past it's really a good idea to switch to a retention policy based on a "RECOVERY WINDOW" instead. In our case a RECOVERY WINDOW OF 2 DAYS would be more appropriate.

Sunday, November 29, 2009

Strange RMAN snapshot controlfile issue

A strange thing happen today. I executed a delete obsolete command on my RMAN prompt and it reported the snapshot controlfile as obsolete. I don't know under which circumstances this problem occurs and I couldn't find any relevant information on forums or metalink (oh! sorry "my oracle support") about this.

Below is the output of the DELETE OBSOLETE command:
RMAN> delete obsolete;

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
using channel ORA_DISK_1
using channel ORA_DISK_2
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Control File Copy 36 29-11-2009 12:35:33 /u01/app/oracle/product/11.2.0/
dbhome_1/dbs/snapcf_tetris.f

Do you really want to delete the above objects (enter YES or NO)? y
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of delete command on ORA_DISK_2 channel at 11/29/2009 21:11:16
ORA-19606: Cannot copy or restore to snapshot control file


Indeed, this is the default configured snapshot controlfile:
RMAN> show snapshot controlfile name;               

RMAN configuration parameters for database with db_unique_name TETRIS are:
CONFIGURE SNAPSHOT CONTROLFILE NAME TO '/u01/app/oracle/product/11.2.0/
dbhome_1/dbs/snapcf_tetris.f';

It seems I'm in a kind of deadlock here. The snapshot controlfile is reported as obsolete but it can't be deleted as it is used by RMAN. The only solution I found was to change the RMAN configuration to use another snapshot controlfile, to remove then the reported obsolete one and to switch back to the default. However, the question remains: why the snapshot controlfile is reported as obsolete?

PS: This happend on a 11gR2 database installed under a Linux x86 platform.

Update: Apparently this is encountered after executing a DUPLICATE database from ACTIVE DATABASE. Furthermore, the snapshot controlfile is reported as a "datafile copy" when a CROSSCHECK is suggested. See below:
RMAN> delete obsolete;                                                                                                                                                                                           

RMAN retention policy will be applied to the command
RMAN retention policy is set to redundancy 1
using channel ORA_DISK_1
using channel ORA_DISK_2
Deleting the following obsolete backups and copies:
Type Key Completion Time Filename/Handle
-------------------- ------ ------------------ --------------------
Control File Copy 40 30-11-2009 18:41:15 /u01/app/oracle/product/11.2.0/dbhome_1
/dbs/snapcf_tetris.f

Do you really want to delete the above objects (enter YES or NO)? y

RMAN-06207: WARNING: 1 objects could not be deleted for DISK channel(s) due
RMAN-06208: to mismatched status. Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212: Object Type Filename/Handle
RMAN-06213: --------------- ---------------------------------------------------
RMAN-06214: Datafile Copy /u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_tetris.f

Obviously, that can't be a datafile copy. So, let's try a crosscheck as suggested:
RMAN> crosscheck datafilecopy '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_tetris.f';                                                                                                                     

using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=148 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=140 device type=DISK
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of crosscheck command at 11/30/2009 19:09:43
RMAN-20230: datafile copy not found in the repository
RMAN-06015: error while looking up datafile copy name: /u01/app/oracle/product/11.2.0
/dbhome_1/dbs/snapcf_tetris.f

Okey, this was expected as I don't have any datafilecopy with that name despite of what RMAN says. So, let's try a crosscheck for the controlfile copy:
RMAN> crosscheck controlfilecopy '/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_tetris.f';                                                                                                                  

released channel: ORA_DISK_1
released channel: ORA_DISK_2
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=148 device type=DISK
allocated channel: ORA_DISK_2
channel ORA_DISK_2: SID=140 device type=DISK
validation failed for control file copy
control file copy file name=/u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_tetris.f
RECID=40 STAMP=704313675
Crosschecked 1 objects

As it can be seen the validation fails, although the file exists on that location:
$ ls -al /u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_tetris.f
-rw-r----- 1 oracle oinstall 10436608 Nov 30 18:57 /u01/app/oracle/product/11.2.0/dbhome_1/dbs/snapcf_tetris.f

I don't know if this is documented somewhere but it looks to me like a bug. No idea why the snapshot control file is messed up after a DUPLICATE TARGET DATABASE ... FROM ACTIVE DATABASE.

Friday, November 27, 2009

TSPITR to recover a dropped tablespace

A nice feature of Oracle 11gR2 is the ability to recover a dropped tablespace using TSPITR. Of course, in order to succeed this, you need valid backups. Let's test this! First of all, just to be on the safe side, take a fresh backup of the database:
BACKUP DATABASE PLUS ARCHIVELOG;

Then supposing you have a "MUCI" tablespace, simply drop it:
drop tablespace MUCI including contents;

Let's try to recover "MUCI" tablespace. You'll need the nearest timestamp or SCN before the tablespace was dropped.

If you are tempted to use fully automatic TSPITR then be prepared for troubles. This is what happen to me when I tried it:
RMAN> recover tablespace muci until scn 2240386 auxiliary destination '/u01/app/backup';

...

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of recover command at 11/27/2009 21:57:13
RMAN-06965: Datapump job has stopped
RMAN-06961: IMPDP> Job "SYS"."TSPITR_IMP_hilc" stopped due to fatal error at 21:57:09
RMAN-06961: IMPDP> ORA-39123: Data Pump transportable tablespace job aborted
ORA-01565: error in identifying file '/u01/app/oracle/oradata/TETRIS/datafile/o1_mf_muci_5k0bwdmb_.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3


I google it and found this post which recommends to drop the tablespace without "AND DATAFILES" but, as far as I'm concerned, it didn't work.
Nevertheless, setting a new name for the datafile which belongs to the dropped datafile did the job.
RMAN> run {
2> set newname for datafile 6 to new;
3> recover tablespace muci until scn 2240386 auxiliary destination '/u01/app/backup';
4> }

A direct consequence of this in 11gR2 is that you can apply multiple TSPITR for the same tablespace without using a recovery catalog. If you chosen a wrong SCN and you already brought the recovered tablespace ONLINE then you can simply drop it and try again with another SCN.

Awesome!

Wednesday, November 18, 2009

Do archivelogs become obsolete if they contain blocks from an BEGIN BACKUP operation?

Of course, not every possible case is described within the docs therefore some of them have to be simply tried. So, today I was wondering what would happen if I leave a tablespace in BEGIN BACKUP mode and I will continue to backup the database using:
RUN {
BACKUP DATABASE PLUS ARCHIVELOG;
DELETE NOPROMPT OBSOLETE.
}

As you already know, if a tablespace is put in BEGIN BACKUP mode then all subsequent changes will force the dirty blocks to be written into the redologs which will be eventually archived. My main concern here was regarding the DELETE OBSOLETE command. Is RMAN smart enough to know that those archives are not going to become obsolete as long as the BEGIN BACKUP status is in place? After some tests I can conclude: RMAN knows this and will NOT consider those archives as obsolete. This was kind of obvious but, you know... it's always good to try and to see by your own eyes.

Friday, December 22, 2006

CANCEL if you want to work!

Ohoo, this is quite nice! Usually, you wouldn't expect to make something to work unless you specify CANCEL, right? One interesting case is when you want to recreate the control file with the RESETLOGS option. The scenario is simple, you have an old backup which contains all your datafiles and archivelogs and, in addition, you was smart enough to backup the control file to trace. Now supposing you lost all current datafiles and the current redolog files. You have to restore and recover from your backup. The first thing is to restore your datafiles to the known location, to start the instance in the nomount state and to issue the CREATE CONTROLFILE command from the trace file. After the command is successfully executed your instance will be brought into the mount state using the fresh created control file. Now, the next step is to use the RECOVER DATABASE command. Bellow is a sample output:

SQL> startup nomount
ORACLE instance started.

Total System Global Area 264241152 bytes
Fixed Size 1218868 bytes
Variable Size 75499212 bytes
Database Buffers 184549376 bytes
Redo Buffers 2973696 bytes
CREATE CONTROLFILE REUSE DATABASE "DDB" RESETLOGS ARCHIVELOG
MAXLOGFILES 16
MAXLOGMEMBERS 3
MAXDATAFILES 100
MAXINSTANCES 8
MAXLOGHISTORY 292
LOGFILE
GROUP 1 (
'/opt/oracle/oradata/DDB/onlinelog/o1_mf_1_2pdco23c_.log',
'/opt/oracle/flash_recovery_area/DDB/onlinelog/o1_mf_1_2pdco49x_.log'
) SIZE 50M,
GROUP 2 (
'/opt/oracle/oradata/DDB/onlinelog/o1_mf_2_2pdco6j0_.log',
'/opt/oracle/flash_recovery_area/DDB/onlinelog/o1_mf_2_2pdco8ns_.log'
) SIZE 50M,
GROUP 3 (
'/opt/oracle/oradata/DDB/onlinelog/o1_mf_3_2pdcoby5_.log',
'/opt/oracle/flash_recovery_area/DDB/onlinelog/o1_mf_3_2pdcof0o_.log'
) SIZE 50M
-- STANDBY LOGFILE
DATAFILE
'/opt/oracle/oradata/DDB/datafile/o1_mf_system_2rlg719d_.dbf',
'/opt/oracle/oradata/DDB/datafile/o1_mf_undotbs1_2rlg71l3_.dbf',
'/opt/oracle/oradata/DDB/datafile/o1_mf_sysaux_2rlg714v_.dbf',
'/opt/oracle/oradata/DDB/datafile/o1_mf_users_2rlg71kk_.dbf'
CHARACTER SET WE8ISO8859P1
;

SQL> select open_mode from v$database;

OPEN_MODE
----------
MOUNTED

SQL> recover database using backup controlfile;
ORA-00279: change 793731 generated at 12/22/2006 12:00:32 needed for thread 1
ORA-00289: suggestion :
/opt/oracle/flash_recovery_area/DDB/archivelog/1_1_609804799.dbf
ORA-00280: change 793731 for thread 1 is in sequence #1


Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00279: change 794266 generated at 12/22/2006 12:09:00 needed for thread 1
ORA-00289: suggestion :
/opt/oracle/flash_recovery_area/DDB/archivelog/1_2_609804799.dbf
ORA-00280: change 794266 for thread 1 is in sequence #2
ORA-00278: log file
'/opt/oracle/flash_recovery_area/DDB/archivelog/1_1_609804799.dbf' no longer
needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00279: change 794268 generated at 12/22/2006 12:09:03 needed for thread 1
ORA-00289: suggestion :
/opt/oracle/flash_recovery_area/DDB/archivelog/1_3_609804799.dbf
ORA-00280: change 794268 for thread 1 is in sequence #3
ORA-00278: log file
'/opt/oracle/flash_recovery_area/DDB/archivelog/1_2_609804799.dbf' no longer
needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00279: change 794271 generated at 12/22/2006 12:09:07 needed for thread 1
ORA-00289: suggestion :
/opt/oracle/flash_recovery_area/DDB/archivelog/1_4_609804799.dbf
ORA-00280: change 794271 for thread 1 is in sequence #4
ORA-00278: log file
'/opt/oracle/flash_recovery_area/DDB/archivelog/1_3_609804799.dbf' no longer
needed for this recovery


Specify log: {=suggested | filename | AUTO | CANCEL}

ORA-00308: cannot open archived log
'/opt/oracle/flash_recovery_area/DDB/archivelog/1_4_609804799.dbf'
ORA-27037: unable to obtain file status
Linux Error: 2: No such file or directory
Additional information: 3


SQL> ALTER DATABASE OPEN RESETLOGS;
ALTER DATABASE OPEN RESETLOGS
*
ERROR at line 1:
ORA-01113: file 1 needs media recovery
ORA-01110: data file 1:
'/opt/oracle/oradata/DDB/datafile/o1_mf_system_2rlg719d_.dbf'

Uppsy! File 1 needs media recovery? Why?
Because, when I have been asked for the last log archive I didn't specify CANCEL. From the Oracle server point of view, the recover process was unsuccessful. So, the correct approach is:

SQL> recover database until cancel using backup controlfile;
ORA-00279: change 794271 generated at 12/22/2006 12:09:07 needed for thread 1
ORA-00289: suggestion :
/opt/oracle/flash_recovery_area/DDB/archivelog/1_4_609804799.dbf
ORA-00280: change 794271 for thread 1 is in sequence #4


Specify log: {=suggested | filename | AUTO | CANCEL}
CANCEL
Media recovery cancelled.
SQL> ALTER DATABASE OPEN RESETLOGS;

Database altered.

So, explicit CANCEL does matter in this case!

Thursday, December 21, 2006

Should I Delete All Archives In One Shoot?

I was just wondering what is the best way to delete the backed up archives using RMAN. I have to choose between DELETE INPUT and DELETE ALL INPUT options.
In my environment, all archives are written on two different locations. If I am going to use the DELETE ALL INPUT option then after the archives from one of the two locations are backed up, all the corresponding archive files are deleted from all available locations. If I am going to use DELETE INPUT option then only the backed up archives are deleted, just from one of the two locations.
Now, I must admit that using the BACKUP ARCHIVELOG ALL command with DELETE ALL INPUT sounds to me a little bit scary as after the successful completion of it I will end up with just one backup set with all backed up archives but no archives redundancy. This backup set becomes the single point of failure if a database recovery must take place and those archives are required.
So, the other approach is to use the DELETE INPUT option only. The below figure shows what's happening in this case:


At T0 moment, there are archives not backed up on both locations. The BACKUP command creates the “Bacup Set1” and deletes the corresponding archives from the LOCATION 1. So, now I have archives on LOCATION 2 and into the backup set as well (redundancy still 2). Suppose that after new archives are generated a new backup is taken. The following figure depicts what will happen in this case:


As it can be noticed, the next BACKUP command creates the “Backup Set2” which contains all archives from the LOCATION2 not deleted by the previous BACKUP command and, in addition, the new generated archive logs from the LOCATION1. These backed up archives are deleted but the redundancy level for them is still two. Much safer, right?

Wednesday, December 20, 2006

Don't Forget to Restore Your Read-only Tablespaces

I know, this is a basic one! But, me personally, I don't restore/recover databases everyday therefore is quite easy to forget some basics. So, supposing that you have lost all datafiles and some of them were part of some read-only tablespaces then it is important to remember that, by default, RMAN will not restore any datafiles from those read-only tablespaces. Hopefully, RMAN will display a warning, something like this: “datafile X not processed because file is read-only”, but the restore operation will go on without problems. So, in order to restore all datafiles including the ones from the read-only tablespaces then the correct command is: “RESTORE DATABASE CHECK READONLY;”.