CD2 When is filter going to be made available?

KeithSmith_au
Community Contributor

The documentation contains details about

Filter

Identifies a subset of data to fetch from a table.

(This feature is not currently implemented.)

 

When is this going to be made available?  This feature would be extremely beneficial to reduce the amount of data that needs to be moved and make our implementation much more resilient. 

For efficiency, we use CSV format, and the ability to specify only the columns we actually need would both reduce the volume, but also mean that schema updates that introduce new columns would not change the format of what is returned from our query unless we made explicit changes to request the new columns.

Now that web_logs are available the where row-level filter is of extreme importance.  We are not really interested in trying to accumulate all the web_logs (I estimate that the 30 days web logs for our current activity rate is of the order 7 TB in the zipped format - blowing out to over 26TB when decompressed. Allowing for 30 days, and the fact that weekdays are when our activity occurs, we are talking about 1TB of data per day - which is way too much to realistically move around and process, let alone store for extended periods.

There are some sub-sets (activity where users are acting-as) that we would want to store for audit purposes, but that is a tiny fraction of the total.  We would really like to be able to filter for this activity (which could easily be done).  That would be a regular fetch and update.

It would also be useful to be able to filter for ad-hoc queries.  We don't want to actually replicate the data, but being able to query by user_id over the last 30 days would enable us to make use of the logs in a meaningful way.

Labels (1)