@sam_mcknight ,
I'm not sure this is guaranteed.
On face value, it seems like a dangerous approach and a good way to get duplicate data.
Canvas occasionally repacks data, about once a month maybe? I download the data every 2 days and so I've been inserting data into the requests table. When I look at the data I have now, many of those files have been replaced and the dates on the remaining files are few.
What I mean by that is that if look at the dates on the files that remain, I have only a few dates remaining from 2018, but they are all of the data repacked into a few files. Some may refer to this as a historical dump.
Apr 25 2018
Apr 27 2018
Aug 15 2018
Jun 9 2018
May 9 2018
Oct 17 2018
Jan 11 2019
Mar 5 2019
Apr 7 2019
May 19 2019
After May 19 2019, they start appearing with about every 2 days like I would expect. That May 19 set of files included information I had already inserted into the database in April - May 2019.
If I'm using the name of the file as a key, then I'm going to get duplicate records because of the repack. They won't appear as duplicates in your system, though, because you're wanting to use the filename as a primary key.
What I do is have a separate location that contains the earliest and latest datetime from each file. I also store the latest datetime that I've added to the database. When I load new data, I specify a time (or use the last time stored) and only process files that are newer than that. If I need to go back and do a complete reload, then I wipe out my processing information and let it run the whole thing.
Now, if you do a fresh load of all of the data, you may include the file name there, but it would probably be better to have a separate table that contains the list of processed files, rather than making it part of the requests table directly.
This discussion post is outdated and has been archived. Please use the Community question forums and official documentation for the most current and accurate information.