is there a way to get the disk space occupied by certain rows?
Thank You.
Regards,
A Friend.
SELECT DATALENGTH(ColA) + DATALENGTH(ColB)
From SomeTable
Where Conditionhere
Jens k. Suessmeyer.
http://www.sqlserver2005.de
is there a way to get the disk space occupied by certain rows?
Thank You.
Regards,
A Friend.
Jens k. Suessmeyer.
http://www.sqlserver2005.de
I have a need to filter out certain rows from my data stream. I cannot apply the filter against the source data using my DataReader component, due to some constraints in the source system. Therefore, I must filter the data out after it enters my datastream (trust me on this part).
I have created a data flow that uses the Conditional Split transformation to do this. I created one condition that matches the rows I want to discard. I then connected the Default output stream to my target table. I have simply left the "discard" output disconnected. This appear to do what I want.
My question is: is it OK to leave outputs disconnected in this fashion? It isn't really apparent when viewing the package that the conditional split is discarding rows. Is there a better way to handle this situation? For now I've just added an annotation to the package that describes what is happening.
Thanks for any help
You can use a rowcount transform to make it more 'evident'. That offers the extra benefit of getting the number of rows you discarded; which comes handy for auditing proposes.
There is a 3rd party adapter here that you may want to look as well(i have not used it):
http://www.sqlis.com/56.aspx
|||Yes, you have implemented that perfectly. The only thing I would add is to hook your "filtered" rows up to a Row Count transformation. This will do two things. One, it will let you see that records are going down that flow when debugging. Two it will let you capture the number of rows that went through there in a variable so that you can log it later if you wish.|||As the other guys have said, you have done this in exactly the right way.
I disagree with Phil slightly though (sorry Phil ). I wouldn't bother connecting the dangling output to a rowcount component. This will simply make the data-flow do some unnecassary work. If you DO want to count the number of filtered rows then sure, use a rowcount component - although you can still determine the number of filtered out rows with introducing an additional component by substracting the number of output rows from the number of input rows. These two values are available by logging the OnPipelineRowsSent event.
-Jamie
|||
Jamie Thomson wrote:
If you DO want to count the number of filtered rows then sure, use a rowcount component - although you can still determine the number of filtered out rows with introducing an additional component by substracting the number of output rows from the number of input rows. These two values are available by logging the OnPipelineRowsSent event.
-Jamie
Good idea! How easy is it then to capture that and use it in auditing from a control flow task?|||
Phil Brammer wrote:
Jamie Thomson wrote: If you DO want to count the number of filtered rows then sure, use a rowcount component - although you can still determine the number of filtered out rows with introducing an additional component by substracting the number of output rows from the number of input rows. These two values are available by logging the OnPipelineRowsSent event.
-Jamie
Good idea! How easy is it then to capture that and use it in auditing from a control flow task?
You can get hold of it in the eventhandler. You'd have to parse it out of the message but that's no biggie.
Not as easy as rowcount though. Options...always options!
-Jamie
|||It's amazing how deep you can get in SSIS... Many dark corners yet unexplored!
FHRxxxxxxxxxxxxxxxxxxxxx
SHR1xxxxxxxxxxxxxxxxxxxx
DataRow
DataRow
STR1xxxxxxxxxxxxxxxxxxxx
SHR2xxxxxxxxxxxxxxxxxxxx
DataRow
DataRow
DataRow
STR2xxxxxxxxxxxxxxxxxxxx
FTRxxxxxxxxxxxxxxxxxxxxx
Where FHR=File Header, SHR=Section Header, STR=Section Trailer & FTR=File Trailer
How can I discard all the header and trailer rows?
I have the manager defined as ragged right and have defined the fixed length columns.
I know I can tell the flat file connection manager to discard n header records. I can also tell it to redirect rows which are truncated to "nowhere" but some headers & trailers still get through (they're longer than data rows)
Is there any other way I can discard these rows? Something like "if this rows starts with SHR/STR/etc. ignore it"?
Ta.
Greg.
Greg,
You can use a conditional split transform to filter out these rows but this is AFTER they have been loaded into the pipeline. if you want to discard them BEFORE they get into the pipeline then you're probably into script source component or custom source adapter territory.
Donald Farmer has a fantastic example of how to load "irregular" files such as these using the script component in his book - you can find it on Amazon.
-Jamie
|||
Hey Greg,
the easiest way is the to use the conditional split component.
Your condition looks like this example:
SUBSTRING(Input,1,3) == "SHR"
For more power take a look at the scripting book of Donald Farmer.
I like the scripting component!!!
Loom
|||Thanks guys. Great stuffGreg.
The data is dumped and imported fresh every day.
Obviously I don't want my production Transaction Log getting cluttered up with 70 million rows of transaction logs daily.
Is there any way to avoid logging to the transaction log with an SSIS task?Are you using bulk insert or are you running 70 million insert statements? There is a varitety of ways to truncate / shrink the transaction log after you are done processing records.|||
Greg Van Mullem wrote:
Are you using bulk insert or are you running 70 million insert statements? There is a varitety of ways to truncate / shrink the transaction log after you are done processing records.
I am going to be using bulk inserts.|||
You can either put your database into simple recovery which will provide 0 recovery and then put back into "full" mode and do a full backup after your finished.
With "simple" you will only be able to recover from your previous backup up to the point you flipped into simple mode
The other option is to use BULK Logged mode. This mode logs extent allocations, once completed you should put your database back into full mode at which point when you back up the log all pages/extents changed during the bulk logged period are written to the backup.
Your transaction log will always grow until a in simple mode a checkpoint occurs at which point it is emptied of commited transactions, or in bulk logged/full the transaction log is backed up. If you have 1 transaction for all the work then the transaction log will need to be big enough to handle all the changes made in the transaction
|||Have you consided solving this problem with a hardware purchase instead of software tweaks? Buy big disk drives and let the transaction log explode in size. If it's too slow then buy faster disk drives or servers.This is often the most cost effective approach these days. More often than not, when I run the numbers on an issue like this, an expensive hardware purchase turns out be cheaper, faster and easier! Just a thought.
Later,
Greg Van Mullem|||
Use the batchsize parameter (I hope bulkinsert has one, as the bcp utility has one). This will leasd into a checkpoint. If you now define an alert for the database (log is used by e.g. 75 percent) and you define a job which will dump the transaction log, you should have no problems with it.
Maybe you have to deal with the percent value. That's it.
The advantage is, that you do not change the recovery model of the database and even your inserts will be part of the backup.
Regards
Norbert
Hello,
It seems we have a table that is on merge replication the table by default doesn't have a PK on it. This table only has about 50 rows of data, somewhere during the initialization process it's trying to create a PK by default on the client end. My assumption is during the creation of the snapshot a script is being generated that we do not want. How can we prevent them from happening as it's creating a PK violation.
The schema script 'if object_id(N'[dbo].[checklistimported]') is not null exec('ALTER TABLE [dbo].[checklistimported] ADD CONSTRAINT
PK_listimported PRIMARY KEY CLUSTERED
(
projectid,
chklstid,
unit,
lot
) WITH( STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
')' could not be propagated to the subscriber due to an error in creating index. A possible cause of this error is one or more join filters has the join_unique_key property se
server SQL2k5 SP1
client SQL Express
Actually the PK should have existed at the publisher side. If it does not exist at the publisher side, it will not be created at the subscriber side.
Are the two servers (publisher and subscriber) with the same case sensitivity? I am suspecting that's the problem.
Display date,Sql server,Sql