Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

COMP 5070 Exam SP5 2018 COMP 5070 Statistical Programming for Data Science Take Home Exam DUE: by 11:55 PM (CST), Friday 23rd November • The take---home exam is worth 30% of your overall grade....

1 answer below »
COMP 5070 Exam SP5 2018
        

COMP 5070 Statistical Programming for
Data Science
Take    Home    Exam    
DUE:    by    11:55    PM    (CST),    Friday    23rd November    
    
• The    take---home    exam    is    worth    30%    of    your    overall    grade.        The    exam    is    out    of    100    marks.
    
• The    exam    is    to    be    submitted    online    as    a    compressed    file    (e.g.    .zip,    .tar.gz,    .gz).    This    
compressed    file    should    include    ALL    code    needed    to    run    your    program    and    any    other    files    you    
created    yourself.    You    do    NOT    need    to    include    any    data    files    provided    to    you,    as    it    will    be    
assumed    I    too    have    them    J
• To    obtain    the    maximum    available    marks    you    should    aim    to:
    
1. Code    all    requested    components    (30%).    
    
2. Use    a    clear    style    of    code    presentation    (10%).    Code    clarity    is    an    important    part    of    your    
submission.    Thus    you    should    choose    meaningful    variable    names    and    adopt    the    use    of    
comments    ---    you    don't    need    to    comment    every    single    line,    as    this    will    affect    readability    ---    
however    you    should    aim    to    comment    at    least    each    section    of    code.    
3. Have    the    code    run    successfully    (5%).    
    
4. Output    the    information    in    a    presentable    manner    and    present    your    written    analysis    of    the    
output.    (55%).    
• Plagiarism    is    a    specific    form    of    academic    misconduct.    Although    the    University    encourages    
discussing    work    with    others    and    the    Social    Forum    will    support    this,    ultimately    this    submission    is    
to    represent    your    individual    work.    If    plagiarism    is    found,    all    parties    will    be    penalised.    You    should    
etain    copies    of    all    assignment    computer    files    used    during    development.    These    files    must    remain    
unchanged    after    submission,    for    the    purpose    of    checking    if    required.
• For    the    purpose    of    this    exam,    a    “paragraph”    is    considered    to    consist    of    approximately    6---8    lines.    
You    are    welcome    to    exceed    this    amount    J
• This    exam    appears    longer    than    it    actually    is    –    explanations    are    given    to    help    you    understand    
the    requested    analyses    and    I    have    also    provided    hints.    
• You    do    not    need    to    write    specialised    code    as    you    did    for    the    assignments.    You    should    be    able    
to    find    nearly    all    the    code    you    need    from    the    R    files    provided    throughout    the    course,    via    case    
studies    and    other    examples.    If    you    copy/paste    code    from    the    R    code    I    have    provided,    this    
should    give    you    nearly    100%    of    the    code    needed    for    this    exam,    with    a    few    alterations    on    your    
ehalf    (e.g.    filenames,    variable    names    etc).
        
Question    1    (60    Marks)    
It’s All in the Taste
                                            Experts    vs    Amateurs    
    
    
    
Who    is    better    at    discerning    the    tastes    of    
supermarket    chocolate?    Do    you    really    need    
training    to    know    if    you    like    it?    Or    does    it    all    
just    taste    really    good?    
    
The    Experts    battle    it    out    against    a    group    of    
dedicated    chocolate-eating    Amateurs!    
    
I    would    really    like    to    have    that    job    J
The    data    for    this    question    are    the    responses    to    the    sensometric    qualities    of    chocolate    that    can    be    purchased    in    
supermarkets.    Two    groups    were    asked    to    rate    the    qualities    of    the    chocolates:    the    first    group    contained    a    panel    
of    sensometric    experts    with    responses    recorded    over    9    different    tasting    sessions.        The    accompanying    data    is    in    
chocolate_experts.csv.            
The    second    group    contained    a    panel    of    volunteers    chosen    to    represent     ‘regular    shoppers’    who    underwent    a    
three-hour     sensometric     training     session    before     rating     the    qualities    of     the     chocolate    over     2    different     tasting    
sessions.        The    accompanying    data    is    in    chocolate_amateurs.csv.            
The     responses     were     recorded     over     a     continuous     scale     from     0     to     10     with     0     indicating     the     absence     of     the    
sensometric    quality    and    10    indicating    fully    present.        It    is    of    interest    to    determine    if    experts    perceive    supermarket    
chocolate    differently    to    non-experts    (the    amateurs)    using    14    sensometric    variables    (Chocolate    Aroma    through    
to    Granular    Texture    in    the    data    files).    
    
For    this    question    you    need    to    randomly    obtain    two    session    ids    for    the    expert    responses    only    by    making    a    call    to    
sample    as    shown    below.    The    two    numbers    that    are    returned    are    your    session    ids    that    you    need    to    extract    for    
your    analysis.    
    
sample(9,2)
    
For    the    expert    data    you    will    only    need    to    analyse    the    responses    co
esponding    to    the    two    randomly    selected    
session    ids.        Amateur    data    needs    to    be    used    in    full.    
    
You    are    asked    to    compare    the    responses    between    the    two    groups    as    requested    in    each    part    below.        A    partially    written    
R    script    is    available    as    part    of    the    exam    package.            You    must    use    this    script    for    your    analysis    and    follow    the    instructions    
therein.        Any    lines    marked    with    
    
#    ###    !!!    EXAM    TIP    !!!    
    
equires    you    to    change    that    line    of    code    to    suit    your    purposes.        Further    details    are    provided    in    the    code    comments    
around    that    line.    
        
For    the    purposes    of    this    exam    a    paragraph    is    8-12    lines    of    text.            Specifically,    your    analysis    should    include:    
    
i) Initial     Data     Discussion:     Write     a     short     explanation     (approximately     1     paragraph)     of     the     analysis     to     be    
performed    and    an    explanation    of    the    data.        Include    your    session    IDs    for    the    expert    responses,    and    any    data    
manipulation    performed    prior    to    analysis    should    you    do    so.        
    
ii) Exploratory    Factor    Analysis:    conduct    two    separate    exploratory    factor    analyses:    the    first    for    your    selected    id    
sessions     for     the    expert     responses,     the    other     for     the     full     set     of     amateur     responses.          You    may    present     the    
analyses    side-by-side    or    in    sequence;    however    you    believe    is    best.        For    each    Exploratory    Factor    Analysis    you    
only    need    to    include    the    following:    
    
For    each    Exploratory    Factor    Analysis    you    need    to    include    the    following:    
    
v If    appropriate,    Cronbach    Alpha    output    and    a     short    discussion     (2---3     lines)    of    whether    
the    data    is    trustworthy    and    why.    
    
v Co
elation     output     of     your     choosing     (graphical     and/or     numerical)     with     an    
accompanying    discussion    (3---4    lines).        If    numerical,    round    the    co
elations    to    2    digits;    
    
v A    single    paragraph    explaining    the    outcome    of    the    determinant    test,    Bartlett’s    test    of    
sphericity    and    the    KMO    statistic    for    both    data    sets.    Do    not    include    R    output.    
    
v Your    decision    regarding    the    number    of    factors    to    estimate    (scree    plot    may    be    shown,    
do    not    show    the    R    console    output).    
    
v The    FINAL    factor    solution.    You    do    not    need    to    discuss    results    of    any    of    the    other    solutions,    
however     you     should     justify     your     final     factor     solution,     including     loadings,     and     name     the    
factors    in    each    analysis.    You    should    also    include    up    to    two    sentences    indicating    whether    the    
test    of    residuals    was    passed    and    whether    the    factors    are    co
elated.        
    
v All    factors    should    be    named    and    an    explanation    as    to    how    you    come    up    with    these    
names    should    be    included.    
    
v Based    on    the    factor    analysis    results    and    your    chosen    factor    names,    discuss    the    factors    
that    have    emerged    from    the    study.        What    types    of    differences    (if    any)    exist    between    
the    expert    and    amateur    sensometric    ratings?        
    
    
iii) Conclusions:    write    2    paragraphs    of    conclusions    based    on    your    analysis.    
    
Hints:        
v To    make    the    co
elation    matrix    more    readable,    use    the    round() command    in    R,    e.g.    
ound(cor(df, 2))
will    compute    the    co
elation    matrix    of    the    data    in    the    matrix    df,    to    two    decimal    places.        You    can    use    
this    tip    for    any    other    matrices    too.    
    
v The    best    solution    may    or    may    not    be    the    rotated    solution,    based    on    your    randomly    selected    
sessions.        Choose    your    solution    based    on    the    principles    of    a    good    Exploratory    Factor    
Analysis    (EFA).    
    
        
v If    items    are    not    loading    on    to    a    factor,    one    reason    could    be    that    you    have    not    extracted    
enough    factors    from    the    data.        Reconsider    your    analysis    if    necessary    however    this    may    not    
solve    the    problem.        Use    the    principles    of    EFA    to    make    your    final    decision.    
    
v While    no    split    loadings    are    desirable    in    EFA,    a    small    number    may    be    unavoidable.    Again    you    
should    ultimately    choose    your    final    solution    based    on    the    principles    of    what    constitutes    a    
good    Exploratory    Factor    Analysis.    
    
v If    the    co
elations    between    factors    suggest    an    oblique    rotation    is    required,    simply    note    this    
in    your    discussion.        Do    not    re-run    the    analysis.        
    
         
        
Question    2    (40    Marks)    
Are We There Yet?
Clustering    Cities    Around    the    World    
    

    
The    data    for    this    question    are    distances    between    cities    in    different    regions    of    the    world.            
    
You    will    need    to    use    the    data    set    individually    assigned    to    you.        
The    file    cities.xlsx    on    the    Assignments    page    indicates    the    continent    assigned    to    each    student.    
    
Each     data     set     contains     a     distance    matrix     and     can     be     found     on     the     assignments     page,     in     a     file     of     the     form    
RegionCitiesClustering.dat.     For     example,     for     the     European     data     the     file     will     be     called    
EuropeanCitiesClustering.dat.     For     this     question,     you     are     asked     to     conduct     clustering     analysis     using     both    
hierarchical    and    partitional    clustering    techniques.    

For    the    purposes    of    this    exam    a    paragraph    is    8-12    lines    of    text.        Specifically,    your    analysis    should    include:    
    
i) Initial     Data     Discussion:     Write     a     short     explanation     (approximately     1     paragraph)     of     the     analysis     to     be    
performed    and    an    explanation    of    the    data    including    any    data    manipulation    performed    prior    to    clustering.        
    
ii) Hierarchical     clustering:     conduct     hierarchical     clustering     on     the     data,     choosing     an     appropriate     AGNES-
ased    method    based    on    either    single,    complete,    average-linkage    or    Ward’s    method.        Ensure    you    justify    
your     choice     in     your     write-up     and     include     the     resulting     dendrogram,     as     well     as     a     discussion     of     the    
outcomes    of    hierarchical    clustering    on    your    data.        
    
iii) Partitional    clustering:    conduct    a    partitional    clustering    of    your    data    using    K-means.        Ensure    you    explain    
and     include     any     relevant     R     output     (including     graphics)     supporting     your     choice     of     k,     the     number     of    
clusters.        
    
iv) Discussion:    (1-2    paragraphs)    of    your    results.            
        
    
v) Validation:    as    a    form    of    cluster    validation,    consider    the    following:    
    
If    there    are    obvious    outliers    or    distances    that    should    be    removed,    identify    these    in    your    write-up    and    re-run    
your    chosen    Partitional    Clustering    algorithm,    adjusting    k    if    necessary.        Include    justification    of    your    choice    of    
the    new    value    for    k.    
    
If     there     are     no     obvious     outliers/distances     that     should     be     removed,     then     explain     this     conclusion     with    
justification.        In    this    case    re-run    your    chosen    Partitional    Clustering    algorithm    for    a    different    value    of    k    to    that    
used    in    Step    3    above.        Include    justification    of    your    choice    for    the    new    value    for    k.    

vi) Conclusions:    write    2    paragraphs    of    conclusions    based    on    your    analysis including    a    statement    regarding    which    
clustering    solution    is    the    better    one    and    why.    

    
Hint:    

v For    hierarchical    clustering,    ensure    you    define    the    height    of    the    dendrogram    according    to    the    size    of    the    values    
in    the    output.
Answered Same Day Nov 10, 2020 COMP 5070 University Of South Australia

Solution

Aakarsh answered on Nov 20 2020
153 Votes
Ques1/Chocolate.pdf
Sensometric qualities of Chocolates
Two groups were asked to rate the qualities of the chocolates:
The responses were recorded over a continuous scale from 0 to 10 with 0 indicating the absence of the sensometric quality and 10
indicating fully present.
The first group contained a panel of sensometric experts with responses recorded over 9 different tasting sessions.
The second group contained a panel of volunteers chosen to represent ‘regular shoppers’ who underwent a three-hour sensometric training
session before rating the qualities of the chocolate over 2 different tasting sessions
Let’s determine if experts perceive supermarket chocolate differently to non-experts (the amateurs) using 14 sensometric variables.
Initial Data Discussion
Following sensometric variables of chocolate quality were responded by group of experts and amatuers on the scale of 1 to 10.
## [1] "Chocolate.Aroma" "Milk.Aroma" "Sweetness"
## [4] "Acidity" "Bitterness" "Chocolate.Flavour"
## [7] "Milk.Flavour" "Caramel.Flavour" "Vanilla.Flavour"
## [10] "Astringency" "Crispy.Texture" "Melting.Texture"
## [13] "Sticky.Texture" "Granular.Texture"
Lets do Exploratory Factor Analysis for both the groups seperately and find out how they are related to each other. We will try to find most
useful sensometric variables selected by users and compare results for both experts and amateurs. We will check whether data is
trustworthy and how variables are co
elated using various statistical methods and tests. Also visualise them using plots for better analysis.
Finally we will find some conclusions based on the analysis performed.
Exploratory Factor Analysis
For experts data
Cronbach Alpha output
Cronbach’a alpha is the measure of the reliability and consistency of the sampling instrument and examine whether all the data is
measuring the same underlying construct.
##
## Reliability analysis
## Call: alpha(x = choc_e_sess)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_
## 0.46 0.48 0.73 0.061 0.91 0.046 3.7 0.78 0.054
##
## lower alpha upper 95% confidence boundaries
## 0.37 0.46 0.55
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## Chocolate.Aroma 0.41 0.44 0.69 0.056 0.78 0.051
## Milk.Aroma 0.50 0.51 0.73 0.074 1.04 0.042
## Sweetness 0.49 0.49 0.73 0.068 0.96 0.043
## Acidity 0.42 0.45 0.72 0.059 0.81 0.051
## Bitterness 0.47 0.49 0.72 0.068 0.95 0.046
## Chocolate.Flavour 0.44 0.46 0.71 0.062 0.87 0.049
## Milk.Flavour 0.49 0.48 0.70 0.066 0.91 0.043
## Caramel.Flavour 0.44 0.43 0.70 0.056 0.77 0.048
## Vanilla.Flavour 0.43 0.42 0.71 0.054 0.74 0.049
## Astringency 0.36 0.41 0.69 0.050 0.68 0.056
## Crispy.Texture 0.44 0.47 0.72 0.063 0.88 0.048
## Melting.Texture 0.46 0.47 0.73 0.064 0.89 0.046
## Sticky.Texture 0.41 0.42 0.72 0.053 0.73 0.050
## Granular.Texture 0.45 0.48 0.74 0.065 0.91 0.048
## var.r med.
## Chocolate.Aroma 0.102 0.0388
## Milk.Aroma 0.097 0.0604
## Sweetness 0.110 0.0604
## Acidity 0.118 0.0604
## Bitterness 0.098 0.0604
## Chocolate.Flavour 0.098 0.0604
## Milk.Flavour 0.091 0.0604
## Caramel.Flavour 0.106 0.0604
## Vanilla.Flavour 0.116 0.0604
## Astringency 0.114 0.0309
## Crispy.Texture 0.108 0.0322
## Melting.Texture 0.117 0.0604
## Sticky.Texture 0.123 -0.0033
## Granular.Texture 0.117 0.0322
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## Chocolate.Aroma 318 0.47 0.44 0.43 0.2980 6.1 2.2
## Milk.Aroma 318 0.11 0.16 0.12 -0.0857 2.1 2.1
## Sweetness 318 0.21 0.25 0.16 -0.0033 4.3 2.3
## Acidity 318 0.45 0.40 0.33 0.2594 3.1 2.3
## Bitterness 318 0.34 0.26 0.22 0.0958 4.2 2.7
## Chocolate.Flavour 318 0.38 0.34 0.32 0.2012 6.2 2.1
## Milk.Flavour 318 0.22 0.29 0.29 0.0110 1.9 2.3
## Caramel.Flavour 318 0.37 0.45 0.43 0.2067 1.6 1.8
## Vanilla.Flavour 318 0.39 0.48 0.42 0.2648 1.3 1.4
## Astringency 318 0.60 0.53 0.51 0.4162 3.6 2.6
## Crispy.Texture 318 0.37 0.33 0.26 0.1788 5.9 2.2
## Melting.Texture 318 0.29 0.31 0.22 0.0940 4.8 2.2
## Sticky.Texture 318 0.46 0.48 0.38 0.2764 3.7 2.2
## Granular.Texture 318 0.33 0.30 0.18 0.1393 2.9 2.1
Alpha value is around 50 % that is acceptable but weak and even dropping any variable won’t make much effect in its value therefore
keeping it as usual. This shows data is not much reliable.
Co
elation Matrix
Here co
elation is represented using color intensity.
## Chocolate.Aroma Milk.Aroma Sweetness Acidity Bitterness
## Chocolate.Aroma 1.00 -0.56 -0.26 0.28 0.48
## Milk.Aroma -0.56 1.00 0.30 -0.05 -0.41
## Sweetness -0.26 0.30 1.00 -0.22 -0.51
## Acidity 0.28 -0.05 -0.22 1.00 0.42
## Bitterness 0.48 -0.41 -0.51 0.42 1.00
## Chocolate.Flavour 0.72 -0.49 -0.43 0.24 0.61
## Milk.Flavour -0.42 0.77 0.42 -0.13 -0.50
## Caramel.Flavour -0.20 0.48 0.30 -0.03 -0.31
## Vanilla.Flavour -0.01 0.28 0.21 -0.07 -0.21
## Astringency 0.34 -0.21 -0.15 0.49 0.59
## Crispy.Texture 0.60 -0.47 -0.06 0.11 0.33
## Melting.Texture -0.11 0.28 0.38 -0.24 -0.19
## Sticky.Texture 0.05 0.13 0.31 0.01 -0.21
## Granular.Texture 0.27 -0.24 -0.07 0.21 0.19
## Chocolate.Flavour Milk.Flavour Caramel.Flavou
## Chocolate.Aroma 0.72 -0.42 -0.20
## Milk.Aroma -0.49 0.77 0.48
## Sweetness -0.43 0.42 0.30
## Acidity 0.24 -0.13 -0.03
## Bitterness 0.61 -0.50 -0.31
## Chocolate.Flavour 1.00 -0.47 -0.24
## Milk.Flavour -0.47 1.00 0.70
## Caramel.Flavour -0.24 0.70 1.00
## Vanilla.Flavour -0.06 0.45 0.61
## Astringency 0.33 -0.26 -0.09
## Crispy.Texture 0.48 -0.46 -0.32
## Melting.Texture -0.30 0.38 0.27
## Sticky.Texture -0.03 0.27 0.25
## Granular.Texture 0.35 -0.29 -0.19
## Vanilla.Flavour Astringency Crispy.Texture
## Chocolate.Aroma -0.01 0.34 0.60
## Milk.Aroma 0.28 -0.21 -0.47
## Sweetness 0.21 -0.15 -0.06
## Acidity -0.07 0.49 0.11
## Bitterness -0.21 0.59 0.33
## Chocolate.Flavour -0.06 0.33 0.48
## Milk.Flavour 0.45 -0.26 -0.46
## Caramel.Flavour 0.61 -0.09 -0.32
## Vanilla.Flavour 1.00 0.01 -0.16
## Astringency 0.01 1.00 0.26
## Crispy.Texture -0.16 0.26 1.00
## Melting.Texture 0.20 0.00 0.00
## Sticky.Texture 0.26 0.08 0.07
## Granular.Texture -0.13 0.30 0.26
## Melting.Texture Sticky.Texture Granular.Texture
## Chocolate.Aroma -0.11 0.05 0.27
## Milk.Aroma 0.28 0.13 -0.24
## Sweetness 0.38 0.31 -0.07
## Acidity -0.24 0.01 0.21
## Bitterness -0.19 -0.21 0.19
## Chocolate.Flavour -0.30 -0.03 0.35
## Milk.Flavour 0.38 0.27 -0.29
## Caramel.Flavour 0.27 0.25 -0.19
## Vanilla.Flavour 0.20 0.26 -0.13
## Astringency 0.00 0.08 0.30
## Crispy.Texture 0.00 0.07 0.26
## Melting.Texture 1.00 0.14 -0.24
## Sticky.Texture 0.14 1.00 0.08
## Granular.Texture -0.24 0.08 1.00
Chocolate.Aroma is positively co
elated with Bitterness,Chocolate.Flavour and Crispy.Texture and
negatively with Milk.Aroma and Milk.Flavou
Milk.Aroma is positively co
elated with Milk.Flavour, Sweetness, Vanilla Flavour and Caramel flavours.
Sticky Texture and Granular Texture are co
elated positively
Determinant, Bartlett and KMO Test
## [1] 0.001128851
## $chisq
## [1] 634.5429
##
## $p.value
## [1] 2.186798e-82
##
## $df
## [1] 91
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r =...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here