2012年3月11日星期日

Disaster Recovery - Backups Unusable

Here's the scenario:
SQL Server installed (against my recommendation) with Data, Log & Backup all
stored on SAN Disk (my preference would to have Log at least on local disk).
In theory, SAN was snapshot backed up nightly - transparent to OS / DBMS (it
sounded suspect at the time, but was assured that this was the case, so
deferred to the experts).
The experts removed the un-presented the SAN LUNs from the OS because they
"thought it wasn't being used".
We now have the situation where we have no usable recent backups (most
recent is probably two weeks old), we may have a file system snapshot (but
probably not one that MSSQL would consider consistent). We have an SQL
Server which saw the disks pulled from under it, and logged the fact: "Error:
3314, Severity: 21, State: 4. 2007-04-05 10:31:42.16 spid95 During
undoing of a logged operation in database 'critical development database', an
error occurred at log record ID (3127:136:1). Typically, the specific failure
is logged previously as an error in the Windows Event Log service. Restore
the database or file from a backup, or repair the database."
Any suggestions on how best to recover from this situation - the database in
question is a development DB, so we're most interested in the objects rather
than the data. The best we can hope for us to find the db files on the
snapshot disk, and attempt to restore WITH RECOVERY. Is CONTINUE_AFTER_ERROR
relevant in this case?
Should I just take a deep breath, and give up on the last two weeks work (it
is always quicker and better the second time around anyway)?Bernard
Let me understand you.You did/or did not perfrom BACKUP DATABASE command ?
Do you have .BAK files on the system?
Do you get this error only when you try to RESTORE DATABASE? Have you tried
to detach and then attach .MDF/.LDF files?
"Bernard Sheppard" <BernardSheppard@.discussions.microsoft.com> wrote in
message news:2064C417-F46B-4269-A830-5E732C3CAE78@.microsoft.com...
> Here's the scenario:
> SQL Server installed (against my recommendation) with Data, Log & Backup
> all
> stored on SAN Disk (my preference would to have Log at least on local
> disk).
> In theory, SAN was snapshot backed up nightly - transparent to OS / DBMS
> (it
> sounded suspect at the time, but was assured that this was the case, so
> deferred to the experts).
> The experts removed the un-presented the SAN LUNs from the OS because they
> "thought it wasn't being used".
> We now have the situation where we have no usable recent backups (most
> recent is probably two weeks old), we may have a file system snapshot (but
> probably not one that MSSQL would consider consistent). We have an SQL
> Server which saw the disks pulled from under it, and logged the fact:
> "Error:
> 3314, Severity: 21, State: 4. 2007-04-05 10:31:42.16 spid95 During
> undoing of a logged operation in database 'critical development database',
> an
> error occurred at log record ID (3127:136:1). Typically, the specific
> failure
> is logged previously as an error in the Windows Event Log service. Restore
> the database or file from a backup, or repair the database."
> Any suggestions on how best to recover from this situation - the database
> in
> question is a development DB, so we're most interested in the objects
> rather
> than the data. The best we can hope for us to find the db files on the
> snapshot disk, and attempt to restore WITH RECOVERY. Is
> CONTINUE_AFTER_ERROR
> relevant in this case?
> Should I just take a deep breath, and give up on the last two weeks work
> (it
> is always quicker and better the second time around anyway)?|||Hi Uri, thanks for the response. See inline below.
"Uri Dimant" wrote:
> Bernard
> Let me understand you.You did/or did not perfrom BACKUP DATABASE command ?
Yes. Daily full backups were performed.
> Do you have .BAK files on the system?
No, when the SAN storage was removed, the disk where the .BAK files were
stored (we kept a weeks worth of daily backus on disk) was returned to the
SAN pool and re-allocated. The .BAK files were *supposed* to be being backed
up to tape, but these backups were unsuccessful. The only BAK we have is
three weeks old - one that was manually moved to local disk so it could kept
as a point in time record.
> Do you get this error only when you try to RESTORE DATABASE?
The error was reported by SQL Server when the SAN disk was removed. We
haven't attempted to restore the database because we have no recent backup
available.
> Have you tried to detach and then attach .MDF/.LDF files?
Not yet, still waiting for them to be restored from disk - but I expect them
to - at best - be in an inconsistent state. So, I guess the first attempt
(after making copies of the .MDF/.LDF files) will be attaching. However, SQL
already has the DBs marked as suspect (because the disk was removed).
Once the disk is restored and the files available, what is the best course
of action to change the DBs from suspect - is it to restore WITH RECOVERY
without specifying a backup device to restore from?
Thanks again.|||Hi Bernhard,
First of all, you might have to get rid of the old entry for the
database. You can try running a DROP DATABASE command to see if that
will remove the database entry. I must admit that I don't know if that
will work when the database isn't operational.
The next step would be to run sp_attach_db and attach the mdf/ldf files
to a database. If you want, you can try to attach the files as a
different database name if you can't get rid of the old one.
Secondly, it's not necessarily a bad idea to have all you database files
in the SAN. Typically a SAN will contain far more spindles than if you
just use internal disks in the server. That will in most cases give you
a better performance than using local disks. Depending on the type of
SAN you are using, it can be good or bad to have all the files in the
same diskgroup in the SAN. E.g. a HP EVA SAN will just have one big pool
of physical spindles in a diskgroup and then you can define LUNS in this
diskgroup but basically it's using a little part of all the disks. The
advantage with this is that you will have lots of spindles available.
--
Regards
Steen Schlüter Persson
Database Administrator / System Administrator
Bernard Sheppard wrote:
> Hi Uri, thanks for the response. See inline below.
> "Uri Dimant" wrote:
>> Bernard
>> Let me understand you.You did/or did not perfrom BACKUP DATABASE command ?
> Yes. Daily full backups were performed.
>> Do you have .BAK files on the system?
> No, when the SAN storage was removed, the disk where the .BAK files were
> stored (we kept a weeks worth of daily backus on disk) was returned to the
> SAN pool and re-allocated. The .BAK files were *supposed* to be being backed
> up to tape, but these backups were unsuccessful. The only BAK we have is
> three weeks old - one that was manually moved to local disk so it could kept
> as a point in time record.
>> Do you get this error only when you try to RESTORE DATABASE?
> The error was reported by SQL Server when the SAN disk was removed. We
> haven't attempted to restore the database because we have no recent backup
> available.
>
>> Have you tried to detach and then attach .MDF/.LDF files?
> Not yet, still waiting for them to be restored from disk - but I expect them
> to - at best - be in an inconsistent state. So, I guess the first attempt
> (after making copies of the .MDF/.LDF files) will be attaching. However, SQL
> already has the DBs marked as suspect (because the disk was removed).
> Once the disk is restored and the files available, what is the best course
> of action to change the DBs from suspect - is it to restore WITH RECOVERY
> without specifying a backup device to restore from?
> Thanks again.|||""Steen Schlüter Persson (DK)"" <steen@.REMOVE_THIS_asavaenget.dk> wrote in
message news:%23EGQhNCfHHA.4128@.TK2MSFTNGP05.phx.gbl...
> Hi Bernhard,
>
Steen, I'm going to quibble a bit here, but in general I think we're in
agreement.
> First of all, you might have to get rid of the old entry for the database.
> You can try running a DROP DATABASE command to see if that will remove the
> database entry. I must admit that I don't know if that will work when the
> database isn't operational.
Actually I wouldn't do this. I'll explain why in a second.
> The next step would be to run sp_attach_db and attach the mdf/ldf files to
> a database. If you want, you can try to attach the files as a different
> database name if you can't get rid of the old one.
>
I'd say the FIRST step, before restarting SQL Server, etc, is to make (if
possible) a backup of the MDF/LDF files (assuming he can get them at all).
It's almost always to have a copy in cases like this. Best case, you toss
the copy. Worse case, if other recovery steps fail, you can go back to what
you had.
Now, as for dropping it. I wouldn't since now that I have a backup, the
first thing I'd try is simply to start SQL Server. I've been pleasantly
surprised how often SQL Server can recover from cases like this. Especially
if the LDF is intact. In this case, no need to drop anything.
If that doesn't work, then I'd probably either try to drop the database or
simply attach it as a different name. (We can also rename it later once we
get things working.)
And if none of this works, call Microsoft, they can do some amazing things
sometimes.
So, in general, I like to not close off any option until I definitely have
to. Making an initial backup ads options. Not dropping, adds options, etc.
> Secondly, it's not necessarily a bad idea to have all you database files
> in the SAN. Typically a SAN will contain far more spindles than if you
> just use internal disks in the server. That will in most cases give you a
> better performance than using local disks. Depending on the type of SAN
> you are using, it can be good or bad to have all the files in the same
> diskgroup in the SAN. E.g. a HP EVA SAN will just have one big pool of
> physical spindles in a diskgroup and then you can define LUNS in this
> diskgroup but basically it's using a little part of all the disks. The
> advantage with this is that you will have lots of spindles available.
>
I tend to agree. Also, in the event of a failure of the server, if
everything is on the SAN, it's pretty straightforward to rehome the LUNS to
a new machine and viola, you're back up and running in minutes.
> --
> Regards
> Steen Schlüter Persson
> Database Administrator / System Administrator
>
> Bernard Sheppard wrote:
>> Hi Uri, thanks for the response. See inline below.
>> "Uri Dimant" wrote:
>> Bernard
>> Let me understand you.You did/or did not perfrom BACKUP DATABASE command
>> ?
>> Yes. Daily full backups were performed.
>> Do you have .BAK files on the system?
>> No, when the SAN storage was removed, the disk where the .BAK files were
>> stored (we kept a weeks worth of daily backus on disk) was returned to
>> the SAN pool and re-allocated. The .BAK files were *supposed* to be
>> being backed up to tape, but these backups were unsuccessful. The only
>> BAK we have is three weeks old - one that was manually moved to local
>> disk so it could kept as a point in time record.
>> Do you get this error only when you try to RESTORE DATABASE?
>> The error was reported by SQL Server when the SAN disk was removed. We
>> haven't attempted to restore the database because we have no recent
>> backup available.
>>
>> Have you tried to detach and then attach .MDF/.LDF files?
>> Not yet, still waiting for them to be restored from disk - but I expect
>> them to - at best - be in an inconsistent state. So, I guess the first
>> attempt (after making copies of the .MDF/.LDF files) will be attaching.
>> However, SQL already has the DBs marked as suspect (because the disk was
>> removed).
>> Once the disk is restored and the files available, what is the best
>> course of action to change the DBs from suspect - is it to restore WITH
>> RECOVERY without specifying a backup device to restore from?
>> Thanks again.
Greg Moore
SQL Server DBA Consulting Remote and Onsite available!
Email: sql (at) greenms.com http://www.greenms.com/sqlserver.html|||> I've been pleasantly
> surprised how often SQL Server can recover from cases like this. Especially
> if the LDF is intact.
Good point. I've got a feeling that I won't be so lucky (assuming I've got
the MDFs and LDFs) because the last thing SQL logged to the error log before
shutting down was the inability to undo a logged operation, but worth a try.
> And if none of this works, call Microsoft, they can do some amazing things
> sometimes.
>
Another good point, one that I'll pursue if I can't get any further myself.
> I tend to agree. Also, in the event of a failure of the server, if
> everything is on the SAN, it's pretty straightforward to rehome the LUNS to
> a new machine and viola, you're back up and running in minutes.
>
Don't get me wrong - I'm a big fan of SANs, and where a server fails, this
is a great feature, but I think they have to be used correctly:
* Log & DB should be on separate physical infrastructure for maximum
recoverability - that can be separate HBAs, brocade switches, yada yada yada,
or alternatively on (say) local SCSI attached external disk. When you lose
one, you don't lose the other. SQL Server copes with that much better than
losing both at the same time.
I've had a SQL server loose a 1TB database after someone literally tripped
over the fibres and ripped them out in the middle of a significant data load.
Once we got the disk re-connected, SQL quite happily rolled forward/back,
and apart from a precautionary backup & integrity check, we just kept
trucking.
* Backups that are dumped to disk should be again on separate physical
infrastructure to the data. If you lose the disk before the backups are on
tape, you haven't lost the backup.
* Tape Backups are not considered successful until they HAVE BEEN RESTORED
(yes, I'm shouting now).
In this case, we simply got told that: Everything goes on the SAN, we manage
it, everything is backed up each night to disk and then to tape. I asked if
that was a snapshot backup and was told yes. Turns out the person answering
that question didn't understand it, and no, it was not a SAN based disk
mirror snapshhot, it was a normal "no open files" backup to tape - and even
that wasn't working.
So, I've got no disk, and I've got no backup, and I've got reduced chances
for a simple recovery if I can get the MDFs and LDFs back because they were
both lost at the same time.
Thanks again for your suggestions.|||"Bernard Sheppard" <BernardSheppard@.discussions.microsoft.com> wrote in
message news:11BB448F-18A3-421D-956D-E145C68DCF33@.microsoft.com...
>> I've been pleasantly
>> surprised how often SQL Server can recover from cases like this.
>> Especially
>> if the LDF is intact.
> Good point. I've got a feeling that I won't be so lucky (assuming I've
> got
> the MDFs and LDFs) because the last thing SQL logged to the error log
> before
> shutting down was the inability to undo a logged operation, but worth a
> try.
>
Actually I've recovered from a number of those. BUT, that was a case of a
RAID controller that was a bit tempermental and would take an entire RAID
off-line in one-swoop. Not sure what the state of your LUNS may be. Cross
your fingers. :-)
>> And if none of this works, call Microsoft, they can do some amazing
>> things
>> sometimes.
> Another good point, one that I'll pursue if I can't get any further
> myself.
>> I tend to agree. Also, in the event of a failure of the server, if
>> everything is on the SAN, it's pretty straightforward to rehome the LUNS
>> to
>> a new machine and viola, you're back up and running in minutes.
> Don't get me wrong - I'm a big fan of SANs, and where a server fails, this
> is a great feature, but I think they have to be used correctly:
> * Log & DB should be on separate physical infrastructure for maximum
> recoverability - that can be separate HBAs, brocade switches, yada yada
> yada,
> or alternatively on (say) local SCSI attached external disk. When you
> lose
> one, you don't lose the other. SQL Server copes with that much better
> than
> losing both at the same time.
Agreed.
> I've had a SQL server loose a 1TB database after someone literally tripped
> over the fibres and ripped them out in the middle of a significant data
> load.
> Once we got the disk re-connected, SQL quite happily rolled forward/back,
> and apart from a precautionary backup & integrity check, we just kept
> trucking.
Tragic, but sweet in its own way :-/
> * Backups that are dumped to disk should be again on separate physical
> infrastructure to the data. If you lose the disk before the backups are
> on
> tape, you haven't lost the backup.
> * Tape Backups are not considered successful until they HAVE BEEN RESTORED
> (yes, I'm shouting now).
Agreed.
> In this case, we simply got told that: Everything goes on the SAN, we
> manage
> it, everything is backed up each night to disk and then to tape. I asked
> if
> that was a snapshot backup and was told yes. Turns out the person
> answering
> that question didn't understand it, and no, it was not a SAN based disk
> mirror snapshhot, it was a normal "no open files" backup to tape - and
> even
> that wasn't working.
Argh. that really sucks.
> So, I've got no disk, and I've got no backup, and I've got reduced chances
> for a simple recovery if I can get the MDFs and LDFs back because they
> were
> both lost at the same time.
Yeah. Unfortunate.
> Thanks again for your suggestions.
No problem, best of luck.
Greg Moore
SQL Server DBA Consulting Remote and Onsite available!
Email: sql (at) greenms.com http://www.greenms.com/sqlserver.html|||Greg D. Moore (Strider) wrote:
> ""Steen Schlüter Persson (DK)"" <steen@.REMOVE_THIS_asavaenget.dk> wrote in
> message news:%23EGQhNCfHHA.4128@.TK2MSFTNGP05.phx.gbl...
>> Hi Bernhard,
> Steen, I'm going to quibble a bit here, but in general I think we're in
> agreement.
>
Hi Greg
You're right that he should keep a copy of the MDF/LDF files - only
reason for why I didn't specifically wrote is was that he was going to
recover them from a backup anyway and then I assumed he would still have
a copy...:-).
Regards
Steen Schlüter Persson
Database Administrator / System Administrator

没有评论:

发表评论