One of our tables, requests, has over 1,000,000,000 rows in it. Since redshift doesn't support indexes, it's a very cumbersome table to work with.
What is the plan for scale in the future?
How would you like to use this data. We are considering exporting a few rollups that will be easier to deal with. We would love to hear from you and others on what kind of rollups make sense.
We will also be adding columns for year, month, quarter, day which should make the table more approachable and allow Redshift's columnar architecture to better query the data.
We would like to be able to use Redshift to compare number of requests across various variables e.g. courses, departments, campuses of the university but our requests table has a little over 1 billion rows (685 million rows where course_id is null) and it's pretty much impossible to get anything out of it except at a fine grain e.g. requests for one day. The most I've been able to get out of the table is the requests for all courses offered by a single department in a single quarter (2.8 million rows). I'm using SQL Workbench/J with a heap size of 2GB and it runs out of memory for larger queries such as number of requests in one quarter for our smallest campus.
I was exploring that table as a way of measuring per student/course "engagement". So tracking how often they're viewing grades/assignments/submissions/external tools/etc.
For the beta, I've been trying to replace an app that works against the api, in this case /api/v1/users/sis_login_id:%s/page_views?start_time=%s. For the most part a rollup by category of page works, but we also do counts per assignment, and per an assignment's submissions.
Retrieving data ...