显示标签为“filter”的博文。显示所有博文
显示标签为“filter”的博文。显示所有博文

2012年3月19日星期一

discarding rows - best practice?

I have a need to filter out certain rows from my data stream. I cannot apply the filter against the source data using my DataReader component, due to some constraints in the source system. Therefore, I must filter the data out after it enters my datastream (trust me on this part).

I have created a data flow that uses the Conditional Split transformation to do this. I created one condition that matches the rows I want to discard. I then connected the Default output stream to my target table. I have simply left the "discard" output disconnected. This appear to do what I want.

My question is: is it OK to leave outputs disconnected in this fashion? It isn't really apparent when viewing the package that the conditional split is discarding rows. Is there a better way to handle this situation? For now I've just added an annotation to the package that describes what is happening.

Thanks for any help

You can use a rowcount transform to make it more 'evident'. That offers the extra benefit of getting the number of rows you discarded; which comes handy for auditing proposes.

There is a 3rd party adapter here that you may want to look as well(i have not used it):

http://www.sqlis.com/56.aspx

|||Yes, you have implemented that perfectly. The only thing I would add is to hook your "filtered" rows up to a Row Count transformation. This will do two things. One, it will let you see that records are going down that flow when debugging. Two it will let you capture the number of rows that went through there in a variable so that you can log it later if you wish.|||

As the other guys have said, you have done this in exactly the right way.

I disagree with Phil slightly though (sorry Phil ). I wouldn't bother connecting the dangling output to a rowcount component. This will simply make the data-flow do some unnecassary work. If you DO want to count the number of filtered rows then sure, use a rowcount component - although you can still determine the number of filtered out rows with introducing an additional component by substracting the number of output rows from the number of input rows. These two values are available by logging the OnPipelineRowsSent event.

-Jamie

|||

Jamie Thomson wrote:

If you DO want to count the number of filtered rows then sure, use a rowcount component - although you can still determine the number of filtered out rows with introducing an additional component by substracting the number of output rows from the number of input rows. These two values are available by logging the OnPipelineRowsSent event.

-Jamie

Good idea! How easy is it then to capture that and use it in auditing from a control flow task?|||

Phil Brammer wrote:

Jamie Thomson wrote:

If you DO want to count the number of filtered rows then sure, use a rowcount component - although you can still determine the number of filtered out rows with introducing an additional component by substracting the number of output rows from the number of input rows. These two values are available by logging the OnPipelineRowsSent event.

-Jamie

Good idea! How easy is it then to capture that and use it in auditing from a control flow task?

You can get hold of it in the eventhandler. You'd have to parse it out of the message but that's no biggie.

Not as easy as rowcount though. Options...always options!

-Jamie

|||It's amazing how deep you can get in SSIS... Many dark corners yet unexplored!

2012年3月8日星期四

Disappearing Parameterised Filters and Filter Joins in Publication Properties page of replicatio

Hi,

We have an issue with our replication configuration when viewed through replication monitor. Parameterised Filters and joined filters don't appear in the gui. However, when we script the publication all the filters are present.

This issue only seems to occur when we have a remote distributor.

I should also point out that we have a merge push topology that uses a custom RMO synchronisation component on a separate server to either the publisher or the distributor. Also all the databases in the topology are called the same name. This has caused us other issues relating to this topology in particular so I raise it here as well although I don't expect it to be the case in this instance.

Any help would be greatly appreciated in clarifying this matter.

To get better response, can you submit this bug/issue to http://connect.microsoft.com? We can track it better there, thanks!

Disappearing Parameterised Filters and Filter Joins in Publication Properties page of replic

Hi,

We have an issue with our replication configuration when viewed through replication monitor. Parameterised Filters and joined filters don't appear in the gui. However, when we script the publication all the filters are present.

This issue only seems to occur when we have a remote distributor.

I should also point out that we have a merge push topology that uses a custom RMO synchronisation component on a separate server to either the publisher or the distributor. Also all the databases in the topology are called the same name. This has caused us other issues relating to this topology in particular so I raise it here as well although I don't expect it to be the case in this instance.

Any help would be greatly appreciated in clarifying this matter.

To get better response, can you submit this bug/issue to http://connect.microsoft.com? We can track it better there, thanks!

2012年2月14日星期二

disable filter

i am having a query in front end like this..

func()

select * from table where cond1 and cond2 and col.port in <string> ... goes

end

i passing this string from other function, i may be not be needing this filter(underlined) for some functionality, but i will be using the same function for both the functionalities

i need a string that should tell the sql server to allow all the possible values for that column, so by disabling the filter which i am using...

pls tell me URGENT
Thanks in advance

In your function check if string is empty or not and call a different SQL. ie:

if < string is empty >
select * from myTable where cond1 and cond2
else
select * from myTable where cond1 and cond2 and col.port in < string > ....
endif

|||

I agree with the other answer, but it seems too easy.. Is this a user defined function in T-SQL? Or dynamically built using your .NET code? If dynamic then it is essential that you leave off the (col.port in <string> ... ) part. You will force the thing to go through a table scan otherwise.

If it is in a user defined function, they you might be able to use a value of the string, like NULL and add
and (col.port in <string> or <string> is null)