Page 1 of 8
COSC 1114 Operating Systems Principles
Semester 2, 2022
Assignment 1
Assessment
Type
Individual assignment.
See Canvas for submission details
Marks awarded for meeting requirements as closely as possible. Clarifications/updates may be
made via announcements
elevant discussion forums.
Submission
Due Date
Start of week 7.
Marks XXXXXXXXXX% of semester)
1. Research Question
We can use Load Balancing as a technique of ensuring that all processes/threads in a
particular [problem are treated in such a fashion that they all finish at approximately the same
time. We frame the problem in the form of a map
educe problem, and then use Static Load
Balancing (S-LBAN) to adjust the process load for optimal net performance.
The question is how accurate can we do this? Is it worth the effort? Clearly Amazon’s Elastic
Computing (EC2) tells us that this is so on a large scale, which typically includes a lot of data
movement and networking. But what about in a real-time system, with no significant
networking – the data a
iving in a batch before the program begins, and having repeatable
statistical properties?
In other words, we measure, adjust, and deliver optimally.
The central task of this assignment is to produce an evidence-based report answering the
above questions.
Methodology
We propose to answer the RQ by creating a series of tasks. First some simple ones in order to
establish a baseline of result expectations, then to rewrite the tasks using the map
educe
software pattern to produce a set of concu
ent threads of different complexity. We can then
adjust the scheduling in order to answer the research question.
2. Overview of Methodology
In this assignment, you must write C/C++ code to implement a map / reduce problem. Map() is a process or function that
maps the input problem to the available resources which may mean subdividing the problem into components and solving
each of those component problems using prioritized job scheduling.
Then reduce() gathers the component solutions and combines them to form the global solution.
You will be provided with “dirty data” and the assignment is then divided into a number of tasks, each of which purports to
solve the problem a different way. In each case, there will be a performance measuring phase, and for the later methods, an
adjustment phase based on the performance metrics obtained.
Throughout, you will be measuring and properly documenting performance (and hopefully improving it)
In the last task, where streaming data is used, it will be important to reduce idle time by ensuring that all subtasks created by
map() finish at the same time, so that reduce() is not kept waiting.
Page 2 of 8
3. Learning Outcomes
This assessment relates to the following course learning outcomes:
• Summarise the full range of considerations in the design of file systems, summarise techniques
for achieving synchronisation in an operation system,
• Compare and contrast the common algorithms used for both pre-emptive and non-pre-emptive
scheduling of tasks in operating systems, such as priority, performance comparison, and fair-
share schemes. Contrast kernel and user mode in an operating system
• Evaluate and report appropriate design choices when solving real-world problems
• Analyse the key trade-offs between multiple approaches to operating system design.
4. Assessment details
General
Your solution to the problem you choose must not use busy waiting, which is where a program sits in a tight loop until some
condition is met as this waste’s CPU time. Instead, you must have each thread sleep until a condition is met (use a condition
variable), wake up and do some processing and then perhaps signal other threads that the job has been done. Please ensure
sufficient output so that it is clear when your program performs any actions such as adding to an a
ay, locking a lock, etc.
You should ensure in your design of the program that you only share between threads the minimum state (variables) possible
as the more information that is shared the more likely there is to be a conflict (deadlocks, race conditions, etc). Please see the
courseware material for more detail on these problems. If your algorithm requires randomness, you should ensure you seed
the random number generator co
ectly.
To ease marking of your assignment, you must call your executables "taskN", where N is the task number as described below.
All materials necessary for the markers to build your tasks must be included in the submitted ZIP file. The markers will use
the CS servers (titan, saturn, jupiter) for the marking, and the code is expected to run there.
Makefile
Your solution must be written in C / c++ and you must supply a Makefile that builds your solution incrementally and then
links them together. If you are feeling a big rusty about make files or have not used them before, we recommend going
through the following tutorial:
https:
www.tutorialspoint.com/makefile/index.htm
Compiler Settings
Please note that as a minimum you must use the ‘-Wall -g’ flags on gcc or equivalent on other compilers to generate
warnings. You may use any supported c++ compiler on the server - if you wish to use a standard above c++ on the server
with g++ you will need to use the scl command, e.g.:
scl enable devtoolset-9 bash
This should get you gcc version XXXXXXXXXXinstead of gcc XXXXXXXXXXUse “gcc –version” to confirm.
Graceful Exit
In order to account for the possibility of thread starvation, you will need to gracefully exit your simulation at the end of
around 10 seconds. We would normally expect even the slowest task – task1() – to not take longer than 10 seconds to
execute.
Use the following method to do this: once you have launched all the required threads from main, call the sleep function, to
specify sleep for ten seconds. Once the sleep function finishes, change the value of a global variable to indicate that all the
threads should exit, which they should be checking regularly.
A less graceful exit might be to call exit() after 10 seconds, but that risks leaving zombies behind with unfinished business –
potentially leading to data loss.
https:
www.tutorialspoint.com/makefile/index.htm
Page 3 of 8
Valgrind
The solution you submit will ideally be as bug-free as possible including free of memory bugs. Your markers will mark your
submission on the titan/jupite
saturn servers provided by the university and we will use the tool "valgrind" to test that your
program is memory co
ect (no invalid access to memory and all memory freed) and as such you should check your program
on titan before submission with this tool. This is for debugging and memory testing only. When gathering performance data,
you should do this on a dedicated machine or VM, since on titan, being a shared resource, the timings will obviously not be
epeatable.
The following command:
valgrind --track-origins=yes --leak-check=full --show-leak-kinds=all ./simulation
Will run your program with valgrind. Please note that in this case the name of the executable produced by compilation of
your program is 'executable'.
Allowed Concu
ency Functions
For the concu
ency part of the assignment, you must limit yourself to the following functions:
• pthread_create
• pthread_join
• pthread_detach
• pthread_mutex_init
• pthread_mutex_destroy
• pthread_mutex_trylock
• pthread_mutex_lock
• pthread_cond_wait
• pthread_cond_signal
• pthread_cond_init
• you may also need the pthread_mutexattr_* functions
• you may also use the scheduling priority function.
This list is not exhaustive and so if you are unsure, please ask on the discussion board about the function you wish to use.
Please note that in practice beyond this course, you would use higher-level c++ functions that call these functions, but part
of what you are learning here is the underlying calls that are made as part of managing concu
ency. That is, part of what
you are learning is to apply the theory that you learn in the workshops to practical problems.
The Source Data
To start off, you may use a words list conventionally stored in ‘/us
share/dict/linux.words’ as a clean data file. You may
need to install this in your distro. Titan does not have it.
The actual data to be used is at https:
www.keithv.com/software/wlist/.
You will find a number of ZIP files containing text files containing (ideally) 1 word per line which you read into an a
ay for
subsequent processing. In fact, the data is ‘dirty’ and you will need to clean It up first. In your performance measurement
using Amdahl’s Law, this is the serial part.
Performance Measurement and Reporting
You will need to devise a way of measuring improvement using primarily execution time, but also resource usage. Consider
some of the metrics discussed in class.
You then need to describe this using graphs and other ways to describe how what you did improved performance, and why.
See reporting details below.
https:
www.keithv.com/software/wlist
Page 4 of 8
5. The Tasks in detail.
The fundamental task is to receive a set of words, clean it up, then use map
educe ultimately to sort it. Words of length
3-15 characters are to be sorted on the third character onwards. Words of other lengths are to be ignored (filtered out).
The tasks below achieve this in different ways.
Below you will find a selection of five tasks called task1 … task5 that must all be done in sequence. In each case, the
simulations should not exceed about 10 seconds and terminate cleanly - no crashes, no deadlocks, no race conditions. This
should be enough time to gather event statistics. You may need to replicate the data in order to make it run long enough. If
so