Counting duplicates in big files
$10-30 USD
Paid on delivery
I have about 14million rows of data with 6 columns in csv format.
Created a working solution in Power BI that do the trick within 30mins but the program has limitation of row size that can be exported for further processing and can only run 2 files (sometimes buggy) whereas i need to run 6 files in a day.
Target:
-a program or any data manipulation software, sql codes that return the counts of the number of rows or entries that have similar content as the current row - from 1 entry only to all 6 columns/entries
-the position of the column is not important in the check e.g. for count of 5 similar entries, the following 2 (representative entries, not actual) rows will have the result of 1 because of 2,3,4,5,6
1,2,3,4,5,6 - 1
2,3,4,5,6,7 - 1
-It should able to return the result fast - not more than 30mins (can be discussed)/ or maximum 4 hours for 6 files.
note: Unfortunately, I cannot give milestone payment for program/solution that cannot meet the processing timing.
Project ID: #20307066
About the project
25 freelancers are bidding on average $48 for this job
I can upload the file into SQL db using SSIS ETL with removal of duplicate records with efficient performance. And there will not be any restriction of no. of files. You can load N number of files in one go. Let me kno More
Hi, My name is Ali and I can work on the task with immediate availability. I can do duplication check in SQL Server. Let's have quick discussion so I can work on it.
Hi. I can make a program that can solve your problem. I have enough experience to tackle the problem. Message me to discuss
Hi. I can write this program on native language (not c# or pypton) and it will calculate very fast. See my reviews and completion rate on this site. Regards, Alex.
Hey I have got your requirement and can deliver you a SQL script that will compute results within maximum 10 minutes. You can message me to get query and check if it is giving you result within time and then you can a More
Okay the program will process in your given time. But you need to discuss more over chat about job. Thanks
Did you manage to make a decision to pick the freelancer? I have got the code ready and I will test it with the 14million rows of data if you can get me a sample CSV. It’s written in Python and is fairly looks for a More
I see what you want, however its not completely clear. So, I might want to ask a few things first if we decide to work on it. It won't take more than 2 days to complete such a program, so 7 days which I am proposing is More
Hi there! I am 4+ years experienced developer as Python, Django, RoR & ReactJS. Please open the chat box for further discussion. Regards,
Thank you for your post, sir. I have a good chance of bidding your project. I want to share opinions about your project by chat. Collaboration with you will be a great boon to me. I put time and quality first in all m More
If it does not work in Python, then we can try to do it Perl. If you have the data in an relational database, we can do something than includes both of them. I am very curious to understand... Singapore is one of t More
Hi! I can make an application for you on C#. It will be maximally fast and process files in minimum time. I can do that in 1-2 hours. Write me to discuss details. Thanks!
Hi, I am an expert in java and python and I can complete this job within a day. I have read your requirements and look forward to working with you. Let's continue this in freelance chat
Hello, Thanks for posting this job and giving us opportunity to apply on it. I have read project description and can assure you that I can handle this job. Please reply back to get into more details over chat board. More
I can upload the file into free version on SQL Express DB using Openquery/Openrowset .The csv file to dump in a location in the system where SQL Express .Then using a Tsql script to get the desire result. The whole pro More