Option #1: Working with Big Data using Multithreading The goal of this project is to use the concepts taught in this course to develop an efficient way of working with Big Data. You should have 2...

1 answer below »

Option #1: Working with Big Data using Multithreading

The goal of this project is to use the concepts taught in this course to develop an efficient way of working with Big Data.

You should have 2 files in your Linux system: hugefile1.txtand hugefile2.txt, with one billion lines in each one. If you do not, please go back to the Module 7 Portfolio Reminder and complete the steps there.

Create a program, using a programming language of your choice, to produce a new file: totalfile.txt, by taking the numbers from each line of the two files and adding them. So, each line in file #3 is the sum of the corresponding line in hugefile1.txtand hugefile2.txt.

For example, if the first 5 lines of your files look as follows:

$ head -5 hugefile*txt

== hugefile1.txt ==

4131

29929

6483

7659

25003

== hugefile1.txt ==

8866

19171

11029

4889

27069

then the first 5 lines of totalfile.txtlook like this:

$ head -5 totalfile.txt

12997

49100

17512

12548

52072

Because the files of such large sizes cannot be read into memory in their entirety at the same time, you need to use concurrency. Reading the files one line at a time will take a long time, so use what you have learned in this course to optimize this process. Be sure to record the amount of time it takes for each version of your program to complete this task.

Optimize the program by using threads, so that you benefit from multiple cores in your CPU. Create a multithreaded program, where each thread works on the next chunk of the file.

Now, break up hugefile1.txtand hugefile2.txtinto 10 files each, and run your process on all 10 sets in parallel. How do the run times compare to the original process?

Explain your methods and results in detail. What conclusions can you make about the different methods of optimizing large file processing? How has the information that you learned in this course helped you to accomplish this task?

Answered 49 days After May 18, 2022

Solution

Robert answered on Jul 07 2022

93 Votes

SOLUTION.PDF

Option #1: Working with Big Data using Multithreading The goal of this project is to use the concepts taught in this course to develop an efficient way of working with Big Data. You should have 2...

Solution

Answer To This Question Is Available To Download

Related Questions & Answers

Submit New Assignment