Counting duplicates in big files

Closed Posted 4 years ago Paid on delivery
Closed Paid on delivery

I have about 14million rows of data with 6 columns in csv format.

Created a working solution in Power BI that do the trick within 30mins but the program has limitation of row size that can be exported for further processing and can only run 2 files (sometimes buggy) whereas i need to run 6 files in a day.

Target:

-a program or any data manipulation software, sql codes that return the counts of the number of rows or entries that have similar content as the current row - from 1 entry only to all 6 columns/entries

-the position of the column is not important in the check e.g. for count of 5 similar entries, the following 2 (representative entries, not actual) rows will have the result of 1 because of 2,3,4,5,6

1,2,3,4,5,6 - 1

2,3,4,5,6,7 - 1

-It should able to return the result fast - not more than 30mins (can be discussed)/ or maximum 4 hours for 6 files.

note: Unfortunately, I cannot give milestone payment for program/solution that cannot meet the processing timing.

Power BI Python SQL MySQL

Project ID: #20307066

About the project

25 proposals Remote project Active 4 years ago

25 freelancers are bidding on average $48 for this job

tausy

Hi, I'm a data engineer with over 5 years of industry experience on a wide array of tech stacks including databases, data warehouses, machine learning, big data/Hadoop. I'm currently pursuing my Master's in Data Scien More

$50 USD in 3 days
(53 Reviews)
5.7
jayantavkumar

I can upload the file into SQL db using SSIS ETL with removal of duplicate records with efficient performance. And there will not be any restriction of no. of files. You can load N number of files in one go. Let me kno More

$30 USD in 1 day
(21 Reviews)
4.7
IshaqKN

Hi, I understood your problem very nicely, processing large amount of csv data in an efficient and speedy way. Well Python is your tool for this task. This is the type of problem (Data Processing) Python solves the be More

$20 USD in 2 days
(23 Reviews)
4.4
AliSafder

Hi, My name is Ali and I can work on the task with immediate availability. I can do duplication check in SQL Server. Let's have quick discussion so I can work on it.

$30 USD in 3 days
(29 Reviews)
4.6
ranahashim

Hi. I can make a program that can solve your problem. I have enough experience to tackle the problem. Message me to discuss

$25 USD in 7 days
(14 Reviews)
4.0
AlexFaster

Hi. I can write this program on native language (not c# or pypton) and it will calculate very fast. See my reviews and completion rate on this site. Regards, Alex.

$250 USD in 3 days
(3 Reviews)
4.0
jiteshparwal93

Hey I have got your requirement and can deliver you a SQL script that will compute results within maximum 10 minutes. You can message me to get query and check if it is giving you result within time and then you can a More

$35 USD in 1 day
(3 Reviews)
3.6
aap31374

Myself Anil have more then 10 years of experience in SQL Server databse development and Administration. I have worked with big Databases for clients like match. Com, nationstar mortgage and with TCS. I am also good f More

$35 USD in 1 day
(5 Reviews)
3.3
juttj110

Okay the program will process in your given time. But you need to discuss more over chat about job. Thanks

$30 USD in 2 days
(8 Reviews)
2.9
l0ginp

Hi, I can manupulate your csv file by python in 1 day. Please send me message so that we could discuss it further. To make sure that employment will truly serve your requirement, you can evaluate my skill by giving pa More

$30 USD in 1 day
(6 Reviews)
2.9
Sendmefreelancer

Did you manage to make a decision to pick the freelancer? I have got the code ready and I will test it with the 14million rows of data if you can get me a sample CSV. It’s written in Python and is fairly looks for a More

$10 USD in 2 days
(1 Review)
2.2
sd21TheDeath

I see what you want, however its not completely clear. So, I might want to ask a few things first if we decide to work on it. It won't take more than 2 days to complete such a program, so 7 days which I am proposing is More

$25 USD in 7 days
(1 Review)
1.0
ThinkStartPL

Hi there! I am 4+ years experienced developer as Python, Django, RoR & ReactJS. Please open the chat box for further discussion. Regards,

$25 USD in 10 days
(1 Review)
0.4
venusholiday657

Thank you for your post, sir. I have a good chance of bidding your project. I want to share opinions about your project by chat. Collaboration with you will be a great boon to me. I put time and quality first in all m More

$20 USD in 7 days
(0 Reviews)
0.0
shekmodi

Hi, We are a young startup based out of Bangalore specializing in the data analysis and data science domain. Recently, we completed a similar project in R language where we checked the words matching (instead of numbe More

$30 USD in 4 days
(0 Reviews)
0.0
alexannick

If it does not work in Python, then we can try to do it Perl. If you have the data in an relational database, we can do something than includes both of them. I am very curious to understand... Singapore is one of t More

$250 USD in 7 days
(0 Reviews)
0.0
Lukacho

Hi! I can make an application for you on C#. It will be maximally fast and process files in minimum time. I can do that in 1-2 hours. Write me to discuss details. Thanks!

$30 USD in 1 day
(0 Reviews)
0.0
natuzaid

Hi, I am an expert in java and python and I can complete this job within a day. I have read your requirements and look forward to working with you. Let's continue this in freelance chat

$40 USD in 1 day
(0 Reviews)
0.0
PageOllice

Hello, Thanks for posting this job and giving us opportunity to apply on it. I have read project description and can assure you that I can handle this job. Please reply back to get into more details over chat board. More

$20 USD in 7 days
(0 Reviews)
0.0
krushnadebashram

I can upload the file into free version on SQL Express DB using Openquery/Openrowset .The csv file to dump in a location in the system where SQL Express .Then using a Tsql script to get the desire result. The whole pro More

$70 USD in 2 days
(0 Reviews)
0.0