I have recently employed for a data scientist position. I can send you the case study I have worked on for the application process. It was around 1.8M rows and basically about cannibalism. I had the data for the sales for each item, for each day, in each store. There were promotion times and I had the product tree to understand which products are from the same family:
Date,"StoreCode","ProductCode","SalesQuantity"
2015-01-01,8,9,-1
2015-01-01,131,9,1
(-1 means return)
ProductCode,"ProductGroup1","ProductGroup2"
1,"A",5
Period,StartDate,EndDate
Promo1,2/10/2015,2/17/2015
Promo2,3/15/2015,3/22/2015
(3 different files). I didn't have the price data, so this was simpler than what you have now. But I managed to impress them.