Assignment 3: awk
CS3423 - Systems Programming Sam Silvestro
Document version v1.0.0
Introduction
For this assignment you will use awk to create a program for summarizing and printing information
ased on the directory listing data of files and information.
You are not to use any other programs, utilities, or scripting languages not covered in class, unless
otherwise specifically and explicitly stated in this document.
Your program should take the output from the modified ls command line seen below, and process
the data in order to output the aggregate information:
ls -la --time -style='+%Y/%m/%d %H:%M:%S'
In fact, to avoid human e
or and ensure you are always using the co
ect command line, I suggest
creating and adding a new alias to your bash resource configuration file:
alias lsa="\ls -la --time -style ='+%Y/%m/%d %H:%M:%S'"
Note that the inclusion of the leading backslash ensures no other previously-defined/existing ls
aliases are used; certain other options such as -h could cause your script to fail, for example.
Aggregated information requirements
The aggregated information processed from the directory listing data should consist of each of the
following, when applicable, in the order specified below (see the input/output example further below
for an example of proper output formatting):
• Per-user grouping of file-related counts found in specified directories
– Username of the entity owning these files
– Total number of directories found that are owned by this use
– Total number of files found owned by this user, printing three values:
∗ All files
∗ Hidden files
∗ “Other” files found that are owned by this use
(these items include, but are not limited to, symbolic links, FIFO’s, character o
lock devices, etc. Basically, anything that is not a regular file nor a directory will
fall under this category)
– Total file storage (in bytes) occupied by this user’s regular files.
• Itemization of the oldest and newest regular files found (if no regular files exist in the listing,
simply report "None" for these items. If only one regular file exists, it is reasonable to report
this file as both the oldest and newest.)
Assignment 3: awk Page 1 of ??
Also note, if multiple files share the same oldest or newest time stamps, you can
eak the tie
however you wish; there are no guidelines you must adhere to while doing so.
• Total file-related counts found in the specified directories
– Total users owning files within these paths
– Total number of files found, printing two values: all files versus hidden files
– Total number of directories found
– Total number of “other” files found
(these items include, but are not limited to, symbolic links, FIFO’s, character or block
devices, etc. Basically, anything that is not a regular file nor a directory will fall unde
this category)
– Total file storage (in bytes) occupied by all regular files listed.
Note: again, do not use sed , Python, or any other languages or utilities not explicitly allowed by
this assignment.
Note 2: ensure to test the processing of ls listings for multiple directories, rather than just one. Such
listings can be generated by passing more than one directory to ls and/or by the simple addition
of the -R recursive option to the custom ls command shown previously. Two examples of such
command lines can be seen here:
ls -la --time -style='+%Y/%m/%d %H:%M:%S' dir1 dir2 dir3
ls -laR --time -style='+%Y/%m/%d %H:%M:%S' dir1
or if you have defined the aforementioned alias, equivalently:
lsa dir1 dir2 dir3 file1 dir4
lsa -R dir1 file1 dir2
Note that these commands can also include filenames alongside the directory names on the command
line as well; this is perfectly permissible and should be accounted for, hence why it was shown in the
example above.
Example
The example execution provided below is an excerpt from the following command, executed using
the provided example input file:
ssilvestro@fox05: /courses/cs/3423/Summer20/assign3$ ./assign3.bash
data/input.txt
Alternatively, the script could be executed as follows on any a
itrary directory using the specified
ls command:
ls -la –time-style='+%Y/%m/%d %H:%M:%S' ~ | ./assign3.bash
Example Input Data
ssilvestro@fox05: /courses/cs/3423/Summer20/assign3$ head -n 30 data/input.txt
Assignment 3: awk Page 2 of ??
total 17160
drwxrwxrwt 98 root root XXXXXXXXXX/04/07 13:38:14 .
drwxr -xr -x 26 root root XXXXXXXXXX/09/04 10:50:29 ..
drwx XXXXXXXXXXpmp099 students XXXXXXXXXX/03/03 20:57:31 appInsights
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 18:41:59 build4129 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:18:42 build8335 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:10:44 build3549 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:08:55 build4369 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 18:18:44 build4943 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:17:13 build0725 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 19:08:39 build5604 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:08:08 build9771 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:08:32 build5695 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:13:35 build6382 .log
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 20:07:57 build4429 .log
drwxr -xr -x 3 bfn715 students XXXXXXXXXX/03/03 23:07:12 dlight_bfn715
drwx XXXXXXXXXXdad980 students XXXXXXXXXX/03/05 15:44:15 dlight_dad980
drwx XXXXXXXXXXh
980 students XXXXXXXXXX/04/06 09:54:44 dlight_h
980
drwx XXXXXXXXXXhrm102 students XXXXXXXXXX/04/06 18:43:17 dlight_hrm102
drwx XXXXXXXXXXkaq447 students XXXXXXXXXX/02/26 17:58:46 dlight_kaq447
drwx XXXXXXXXXXmce237 students XXXXXXXXXX/03/30 00:04:57 dlight_mce237
drwx XXXXXXXXXXmjy610 students XXXXXXXXXX/02/27 15:33:54 dlight_mjy610
drwx XXXXXXXXXXpdq039 students XXXXXXXXXX/04/06 18:43:48 dlight_pdq039
drwx XXXXXXXXXXxie192 students XXXXXXXXXX/03/23 17:47:37 dlight_xie192
drwx XXXXXXXXXXynb963 students XXXXXXXXXX/04/07 13:26:46 dlight_ynb963
-rw XXXXXXXXXXh
980 students XXXXXXXXXX/03/09 16:25:53 exe XXXXXXXXXXtxt
-rw XXXXXXXXXXh
980 students XXXXXXXXXX/04/03 13:39:09 exe XXXXXXXXXXtxt
-rw XXXXXXXXXXh
980 students XXXXXXXXXX/03/09 13:28:36 exe XXXXXXXXXXtxt
-rw XXXXXXXXXXh
980 students XXXXXXXXXX/04/06 10:16:23 exe XXXXXXXXXXtxt
-rw XXXXXXXXXXmce237 students XXXXXXXXXX/03/01 18:17:50 exe XXXXXXXXXXtxt
...
...
Example Output
Username: mjy610
Directories: 3
Username: h
980
Files:
All: 196
Hidden: 2
Directories: 3
Storage (B): 77543 bytes
Username: pdq039
Directories: 3
Username: zqu051
Files:
All: 452
Hidden: 0
Storage (B): XXXXXXXXXXbytes
Assignment 3: awk Page 3 of ??
Username: mce237
Files:
All: 52
Hidden: 1
Directories: 4
Storage (B): XXXXXXXXXXbytes
Username: dad980
Files:
All: 4
Hidden: 1
Directories: 3
Storage (B): 6614 bytes
Username: pmp099
Directories: 2
Others: 10
Username: ynb963
Files:
All: 1
Hidden: 0
Directories: 3
Storage (B): 2894 bytes
Username: xie192
Directories: 3
Username: kaq447
Files:
All: 2
Hidden: 0
Directories: 3
Storage (B): 3092 bytes
Username: bfn715
Directories: 3
Username: root
Files:
All: 1
Hidden: 0
Directories: 5
Others: 1
Storage (B): 11 bytes
Assignment 3: awk Page 4 of ??
Username: hrm102
Directories: 3
Oldest file:
-r--r--r-- 1 root root XXXXXXXXXX/02/23 12:11:04 ←↩
yum.p1922.lock
Newest file:
-rw XXXXXXXXXXh
980 students XXXXXXXXXX/04/07 11:10:08 ←↩
output XXXXXXXXXX
Total users: 13
Total files:
(All / Hidden): ( 708 / 4 )
Total directories: 38
Total others: 11
Storage (B): XXXXXXXXXXbytes
Extra Credit (200% / n)
A 200% bonus will be awarded for those whose script co
ectly and properly sorts the username-
grouped portion of the output based on the total computed storage space for each user (use the
“Storage” field for this number), displayed in ascending order of their total storage size (i.e. users
with the least/no storage consumption will appear first). Break ties alphabetically (e.g. many users
will consume zero storage space due to “other” files and directories not contributing to the file count;
only regular files contribute).
Once again, n represents the number of students who completed this extra credit portion co
ectly,
in its entirety. No points will be awarded for partial credit; this feature must function properly, as
described, in order to be eligible for these extra bonuses.
Extra Credit Output
Username: bfn715
Directories: 3
Username: hrm102
Directories: 3
Username: mjy610
Directories: 3
Username: pdq039
Directories: 3
Assignment 3: awk Page 5 of ??
Username: pmp099
Directories: 2
Others: 10
Username: xie192
Directories: 3
Username: root
Files:
All: 1
Hidden: 0
Directories: 5
Others: 1
Storage (B): 11 bytes
Username: ynb963
Files:
All: 1
Hidden: 0
Directories: 3
Storage (B): 2894 bytes
Username: kaq447
Files:
All: 2
Hidden: 0
Directories: 3
Storage (B): 3092 bytes
Username: dad980
Files:
All: 4
Hidden: 1
Directories: 3
Storage (B): 6614 bytes
Username: h
980
Files:
All: 196
Hidden: 2
Directories: 3
Storage (B): 77543 bytes
Username: zqu051
Assignment 3: awk Page 6 of ??
Files:
All: 452
Hidden: 0
Storage (B): XXXXXXXXXXbytes
Username: mce237
Files:
All: 52
Hidden: 1
Directories: 4
Storage (B): XXXXXXXXXXbytes
Oldest file:
-r--r--r-- 1 root root XXXXXXXXXX/02/23 12:11:04 ←↩
yum.p1922.lock
Newest file:
-rw XXXXXXXXXXh
980 students XXXXXXXXXX/04/07 11:10:08 ←↩
output XXXXXXXXXX
Total users: 13
Total files:
(All / Hidden): ( 708 / 4 )
Total directories: 38
Total others: 11
Storage (B): XXXXXXXXXXbytes
Assignment Data
A few sample input files can be found at the following location on the fox servers, however it
is imperative that you fa
icate many of your own examples to ensure that your script functions
according to the specifications outlined above:
us
local/courses/ssilvestro/cs3423/Summer20/assign3.
Script Files
Your submission should consist of exactly two files:
• assign3.bash - a bash script used as the driver program for your awk script
• assign3.awk - the awk program used in assign3.bash
Script Execution
Your program should each be invoked through a single bash file with input taken from either standard
input, or an a
itrary set of filenames specified on the command line, as shown below.
In addition to the above Assignment Data, your program should also work with a
itrary input from
the ls -la –time-style='+%Y/%m/%d %H:%M:%S' command defined on page 1. This includes
Assignment 3: awk Page 7 of ??
oth reading from one or more named input files, as well as accepting piped or redirected input
directly into standard input, as in these examples:
ls -la --time -style='+%Y/%m/%d %H:%M:%S' ~ | ./ assign3.bash
– or –
./ assign3.bash listing.txt [listing2.txt [...]]
– or –
./ assign3.bash < listing.txt
Submission
Turn your assignment in via Blackboard. Your zip file, named a3-abc123.zip with your personal
abc123 should contain only your two bash and awk files.
If you attempt the extra credit, name your file a3-abc123_EC.zip. Without the _EC, your submission
will be graded as normal.
Assignment 3: awk Page 8 of ??
ername: mjy610
XXXXXXXXXXDirectories: 3
Username: h
980
Files:
XXXXXXXXXXAll: 196
XXXXXXXXXXHidden: 2
XXXXXXXXXXDirectories: 3
XXXXXXXXXXStorage (B): 77543 bytes
Username: pdq039
XXXXXXXXXXDirectories: 3
Username: zqu051
Files:
XXXXXXXXXXAll: 452
XXXXXXXXXXHidden: 0
XXXXXXXXXXStorage (B): XXXXXXXXXXbytes
Username: mce237
Files:
XXXXXXXXXXAll: 52