Discrepancy between Requests and Web_logs
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
I was working on switching over a process I had previously been using CD1 Requests for to show the break down of the types of Operating Systems and Browsers are used by our faculty and students. When I was comparing the data I noticed there was substantially less data in the web_logs table compared to requests. Luckily we have been running both processes so I was then able to compare the same date ranges and the difference between the two has become alarming.
I have yet to identify a pattern between which events make it and which events do not. We have zoomed in to a specific session where we knew what the traffic was for this particular user and random events are just completely missing in CD2 web_logs, in CD1 requests we see the expected events as well as in page views in the GUI. Just to be clear the volume of data I am referring to far exceed what could be considered transactional losses or any other small losses that can occur in big data ETL/ELT pipelines.
When CD2 web_logs was first released this was something we had checked on regularly and the differences were within an acceptable margin, over the last few months the margin has become alarmingly large.
* I excluded July since web_logs did not release on the 1st and I would be comparing a full month to a partial.
Unfortunately September was about the time we had stopped comparing the data between the 2 and began trusting that they were roughly equivalent.
We are still running through all of the possible places on our side of the fence to see if there is a mistake on our end. Our collection process has not changed since the July release of Web Logs. I was wondering if perhaps anyone else out there has both CD1 and CD2 running and is able to see if there is a divergence in the requests and web_logs data in their system?