Great Deal! Get Instant $10 FREE in Account on First Order + 10% Cashback on Every Order Order Now

ISYE 431/531 Reliability Engineering: Group Project Submission Due: 11/30  The purpose of this project is to analyze the failure characteristics using appropriate life distributions and their...

1 answer below »
ISYE 431/531 Reliability Engineering: Group Project
Submission Due: 11/30
 The purpose of this project is to analyze the failure characteristics using appropriate life
distributions and their properties.
 Each team consists of two people, and after submitting project presentations and reports, peer
eviews of each other are conducted through a questionnaire. This will account 10% of
individual’s project grade.
o https:
niu.az1.qualtrics.com/jfe/form/SV_bQotv7uK4WnHzed
 Project submission materials include 1) PowerPoint presentation, 2) report written in Word
(page limit is 10; double spaced; 12-point font; excluding references), and 3) software project
files (MINITAB) via Blackboard.
 MINITAB is required to use for your project.
 The report should include 1) project objective, 2) problem statement, 3) data description, 4)
esults, 5) conclusions, and 6) appendix if needed.
 The report should describe the background, scope of work, any assumptions made, the
analysis methods, and the data results.
 Please choose one from the suggested topics. The data set and relevant resources are
available on Blackboard. Multiple teams cannot choose the same topic. If you want to propose
your own topic, please consult with an instructor.
o Driverless car disengagement
o Hard drive failure
o Copper wire failure
o Wind tu
ines failure
o Paper clip failure experiment
 Each team will have 15 minutes of presentation and 3 minutes of Q&A.
 Here is a grading ru
ic for the project.
Category 100% 85% 70% 50%
Completion Fully completed
assignment
Partially
completed
assignment
Barely completed
assignment
Did not
complete
assignment
Accuracy Few e
ors Some e
ors Many e
ors Did not
complete
Effort/Neatness Showed
excellent effort
and all related
work is shown
neatly and well
organized
Showed good
effort and most
of the related
work is shown
neatly and well
organized
Showed little
effort and little of
the related work
is shown; work is
not neat and/or
well organized
Did not
complete
Presentation Excellent
delivery,
enthusiasm, and
audience
awareness
Good delivery,
enthusiasm, and
audience
awareness
Fair delivery,
enthusiasm, and
audience
awareness
Needs
improvement for
delivery,
enthusiasm, and
audience
awareness
https:
niu.az1.qualtrics.com/jfe/form/SV_bQotv7uK4WnHzed

https:
www.kaggle.com/ezeydan/hard-drive-failure-data?select=hard_drive_failure_data.csv
Context
This dataset contains data between January 2016 and December XXXXXXXXXXyears) filtered and preprocessed from up to 55 million entries from the collected BackBlaze dataset.
Content
The dataset includes disk properties in the system such as total disk capacity, usage details, failure figures and daily drive status information. This dataset includes the calculated lifetime distributions of each unique hard disk.
Acknowledgements
The original collection of dataset can be found here: https:
www.backblaze.com
2/hard-drive-test-data.html
Backblaze states that you can download and use this data for free for your own purpose, Backblaze asks three things 1) cite Backblaze as the source if you use the data, 2) accept that you are solely responsible for how you use the data, and 3) do not sell this data to anyone, it is free.
Inspiration
Reference paper used this dataset:
S. S. Arslan, E. Zeydan, "On the Distribution Modeling of Heavy-Tailed Disk Failure Lifetime in Big Data Centers", submittted to IEEE Trans. on Reliability, April 2019.
Answered 4 days After Nov 23, 2022

Solution

Amar Kumar answered on Nov 26 2022
48 Votes
Project Objective
In today's data centres, The primary and most commonly used data storage devices are HARD drives and more recent solid-state drives (SSDs). These systems may be influenced by the same hardware and network design due to their closeness and position in the same area, increasing the risk of facing identical difficulties or near-fault scenarios. Identical to how manufacturing methods and technologies used in construction can compromise storage device connections. As a result, a problem with the hardware or the network might cause several storage devices to fail simultaneously or not work.
Calculating the probability density functions (PDF) of failures is necessary because the unreliability function R(t) is intrinsically linked to the cumulative density functional F(t) via the equation R(t)=1 F(t) is required to assess storage device dependability. A dependability examiner should beat various obstructions to estimate the excess life expectancy of the stockpiling gadgets in a server farm. One is the prediction model's difficulty, which must consider connected e
or situations. This issue may be alleviated by utilising collected data, which the data itself would preserve the association of issues. Traditional techniques to the density estimation problem, on the other hand, look inapplicable due to the necessity to comprehend the sum of all failures at the base's PDF (the smoothing parameter is usually absent).
Kernel density estimation (KDE) is a popular method for estimating PDFs. To adjust a density function to the data presented, this approach chooses a kernel function and a smoothing value. In the literature, The bias-variance tradeoff is changed using the smoothing parameter. However, the underlying PDF structure has a substantial impact on the ideal kernel function and smoothing value selection (Small tail, large tail, etc.) Given that this PDF frequently unavailable, determining thickness becomes a moving test. If, on the other hand, prior statistical information about the data is present, selecting the kernel and smoothing value may be prudent. For instance, it has been established that when the smoothing value is 1.06 N 0.2 and The concentration estimation technique, where is the average standard e
or and N represents the number of data samples, performs best when the convolution operation is Gaussian (negligible and component variance).
A different approach to the density estimate issue is to change the data's power so that it closely resembles one of the well-known distributions. To put it another way, we recommend using parametric estimation in the converted domain rather than nonparametric estimating methods to solve the issue. The data are power transformed using this article's Box-Cox transformation. The standard linear model assumptions must be confirmed using the Box-Cox changes, namely that the input the distribution is spherical This article will demonstrate that the disc failure lifetime is one of many data that cannot be changed to Normal. Not all data can be adjusted to Normal. However, after some scaling and translation, it turns out that the modified data closely resemble the Argus distribution. We also suggest a method for quickly and accurately estimating the underlying PDF and its statistical features. We used a Backblaze dataset available to the public to validate our form and quantify our findings. It's essential to keep in mind that the approach we recommend works with more than disc data. Instead, the proposed system can accurately estimate failure data from any storage device—especially SSDs—with an Argus-like power-transformed PDF.
PLATFORM FOR PERFORMANCE EVALUATION
I. The BackBlaze Dataset was employed.
Around 80.000 disc snapshots taken over many years in the BackBlaze data centre [50] comprise the data we use. Database stores various data, including the date, serial numbers, model, capacity, functioning status, and SMART-based indications. Our dataset has additionally been added to the Kaggle stage. Statistical data may vary according on model, manufacturer, and serial number. A distribution can only be created by categorizing the data itself. In Fig.3, an example of a grouping is shown. Additionally, capacity-based categories may be established. Last, we can use SMART data to use various model and manufacturer discs for our modelling and grouping activities.
II.The Suggested Platform's Features
The statistics of many discs' hard drives pre-gathered from a variety of sophisticated drive devices housed in the Backblaze data centre interact with the suggested platform. To make it possible for co
ect analysis throughout the analytical framework, Fig.4 depicts the whole engineering of our foundation a
angement, including MTTF calculations using data on hard drive lifetimes. The layers for the data source, data collection, data processing and storage, display, and analysis outcomes are five main modules in this framework.
The suggested platform combines open-source analysis tools with information storage and query engine technology (computational storage). The complex driving data and associated statistics are acquired from the datacentre as shown in Fig., and the data are handled progressively inside the multiple data layer, denoted by procedures (2) and (3). (   4).   Preprocessing of the data is shown in Fig.4 (2). Every piece of data collected in storage facilities from various manufacturers is processed during the preprocessing stage. is totaled. including the serial number, fault records, filtration, and aggregation. Stage 3 of the technique includes putting away the preprocessed information as a CSV document in a record framework.
In the layer that collects data, step 4 in Fig. 4, there is a Logstash component, which includes the Logstash listener and the Logstash transformer subcomponents. The tasks of logging, parsing, and conversion are ca
ied out by a Logstash log parsing engine . The Logstash listener transmits the data to the Logstash transformer after listening to the document created in step three. The CSV data format from the file system is transformed by the Logstash converter to the Elastic Search data format for further archiving and analysis. As seen in Fig. In step (5), we use Elastic Search, an open-source, scalable full-text search and data analysis engine, to process and store the data. Elastic Search makes it possible to process queries by storing structured and unstructured data. Information advancement, capacity, access, and examination are done inside the information handling and stockpiling layer. Consequently, this layer is utilized to register all computations connected with utilizations of information for squeezing into the best MTTF estimates and PDFs (for disappointment...
SOLUTION.PDF

Answer To This Question Is Available To Download

Related Questions & Answers

More Questions »

Submit New Assignment

Copy and Paste Your Assignment Here