Hi,
I am a SQL Developer with extensive experience in cloud-based Data Integration and Business Intelligence projects and building ETL processes (Extract, Transform, and Load) using open-source tool 'Talend for Data Integration', which is a Java code generator tool that builds programs out of workflow and data flow designs.
I also know the challenges involved in the enrichment and merging of raw data (your CSV files problem).
My approach:
- Use Talend job to capture filename and timestamp as command line parameters, and timestamp each row in file, and rename CSV file as well
- Use Talend job to either:
-- Copy renamed file to bucket in the cloud (Azure, AWS or Google), or
-- Load file into table located in cloud SQL database (Azure, AWS or Google)
- For reporting (several options):
1. Create straightforward connection between Power BI desktop and Azure SQL Database, publish workbooks (ONCE only), and have access to live data on the web.
2. Given integration between Azure and Power BI is seamless, we can also use Azure Storage to transfer CSV files.
3. Using Google BigQuery to load files, you can have your first terabyte (1 TB) of data processed each month for free. Combined this with any partner BI tool such as Tableau or Qlik. I could even look out for a suitable Open Source BI reporting tool for your project.
Deliverables:
- Packaged Java program + Source code
- Cloud setup
Look forward to hearing from you.
Kind regards,
Yordan B