Python basics.pdf
B Installing andManaging Python
Installing Python via Anaconda
Managing Packages
�6�
�66 Appendix B. Installing and Managing Python
Conda
Pip
�6�
Work�ows
Text Editor + Terminal
�68 Appendix B. Installing and Managing Python
Jupyter Notebook
Integrated Development Environments
C NumPy Visual Guide
n
Data Access
0
=
2
664
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
3
775 =
2
664
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
3
775
Slicing
a b a
= =
2
664
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
3
775 =
2
664
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
3
775
=
2
664
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
3
775 =
2
664
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇥ ⇥
3
775
�6�
��� Appendix C. NumPy Visual Guide
Stacking
=
2
4
⇥ ⇥ ⇥
⇥ ⇥ ⇥
⇥ ⇥ ⇥
3
5 =
2
4
⇤ ⇤ ⇤
⇤ ⇤ ⇤
⇤ ⇤ ⇤
3
5
=
2
4
⇥ ⇥ ⇥ ⇤ ⇤ ⇤ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇤ ⇤ ⇤ ⇥ ⇥ ⇥
⇥ ⇥ ⇥ ⇤ ⇤ ⇤ ⇥ ⇥ ⇥
3
5
=
2
XXXXXXXXXX
⇥ ⇥ ⇥
⇥ ⇥ ⇥
⇥ ⇥ ⇥
⇤ ⇤ ⇤
⇤ ⇤ ⇤
⇤ ⇤ ⇤
⇥ ⇥ ⇥
⇥ ⇥ ⇥
⇥ ⇥ ⇥
3
XXXXXXXXXX
=
⇥
⇥ ⇥ ⇥ ⇥
⇤
=
⇥
⇤ ⇤ ⇤ ⇤
⇤
=
⇥
⇥ ⇥ ⇥ ⇥ ⇤ ⇤ ⇤ ⇤ ⇥ ⇥ ⇥ ⇥
⇤
=
2
4
⇥ ⇥ ⇥ ⇥
⇤ ⇤ ⇤ ⇤
⇥ ⇥ ⇥ ⇥
3
5 =
2
664
⇥ ⇤ ⇥
⇥ ⇤ ⇥
⇥ ⇤ ⇥
⇥ ⇤ ⇥
3
775
Broadcasting
���
=
2
4
1 2 3
1 2 3
1 2 3
3
5 =
⇥
XXXXXXXXXX
⇤
=
2
4
3
5
1 2 3
1 2 3
1 2 3
+⇥ ⇤
XXXXXXXXXX
=
2
4
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
3
5
=
2
4
1 2 3
1 2 3
1 2 3
3
5+
2
4
10
20
30
3
5 =
2
4
XXXXXXXXXX
XXXXXXXXXX
XXXXXXXXXX
3
5
Operations along an Axis
A =
2
664
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
3
775
=
2
664
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
3
775 =
⇥
XXXXXXXXXX
⇤
=
2
664
1 2 3 4
1 2 3 4
1 2 3 4
1 2 3 4
3
775 =
⇥
XXXXXXXXXX
⇤
__MACOSX/._Python basics.pdf
analytics.pptx
Introduction to Analytics
and Big Data Visualization
1
OVERVIEW
Datadriven
Data Science
Data Analytics
What is the difference?
What is
Datadriven?
Being datadriven means
Making business decisions
Managing processes
based on facts
insights derived from data
OVERVIEW
Why
Datadriven?
OVERVIEW
Most companies
have information systems
ut are not datadriven
OVERVIEW
These companies
have information systems
ERP system
CRM module
Accounting system
OVERVIEW
But not aware if the data is
accurate
uptodate
OVERVIEW
Moreover, they may not know
how to make
the best use of the data
OVERVIEW
They may be
doing transactions and
making decisions
with not accurate
with not uptodate
data
OVERVIEW
They may be
doing transactions and
making decisions
with not accurate
with not uptodate, o
ignoring the data
OVERVIEW
Making decisions ignoring the data?
OVERVIEW
Making decisions ignoring the data?
Based on feelings
Based on personal experiences
OVERVIEW
Making decisions ignoring the data?
Based on feelings
Based on personal experiences
OVERVIEW
iased
.
OVERVIEW
.
OVERVIEW
A datadriven organization
puts data at the core of thei
usiness processes
using facts, insights derived from data
to drive their decisionmaking
OVERVIEW
A datadriven organization
moves from guessing and
assumptions
to using data and analytics
to make faster and better decisions
OVERVIEW
Relevant and accurate data are
at the core
of a datadriven organization
OVERVIEW
What is
Data Science?
INTRODUCTION
Data Science is the field of study that combines
computer science
statistics
usiness
to find useful information
from raw data.
INTRODUCTION
Why
Statistics?
DATA SCIENCE
Statistics = data analysis
DATA SCIENCE
Statistics is data
collection
cleaning
organization
visualization
analysis
modeling
presentation
DATA SCIENCE
Statistics is data
collection
cleaning
organization
visualization
analysis
modeling
presentation
DATA SCIENCE
data visualization
static
interactive
animation
DATA SCIENCE
IBM Burning Glass Tech Report
STATIC DATA VISUALIZATION
Interactive Data Visualization Example 1
INTERACTIVE DATA VISUALIZATION
Interactive Data Visualization Example 2
INTERACTIVE DATA VISUALIZATION
Data Animation Example
DATA ANIMATION
Statistics is data
collection
cleaning
organization
visualization
analysis
modeling
presentation
DATA SCIENCE
data modeling
Generalized linear models
Bayesian modeling
cluster analysis
time series modeling
principal components
partial least squares
spatial analysis
DATA SCIENCE
Statistical tools to understand, analyze the data
Random variables
density functions
Outliers
Covariance, co
elation
Probabilities
Bootstrapping
Confidence and Prediction Intervals
DATA SCIENCE
Why
Computer Science?
DATA SCIENCE
Computer Science tools
to collect, process, store the data
Data Wrangling (unstructured to structured data)
Data Warehousing (repo of structured data)
Cloud computing
Big data
Machine learning models
Web developing (frontend)
DATA SCIENCE
Why
Business?
DATA SCIENCE
Business domain knowledge
to make the right questions about
Customer reqs
Products
Processes
Variables
KPIs
Environment variables
DATA SCIENCE
Business domain = Industry
Retail
Health care
Financial
Manufacturing
Government
Services
DATA SCIENCE
Business domain = Science
Biology
Medicine
Physics
Materials science
Chemistry
DATA SCIENCE
What is
Data Analytics?
DATA ANALYTICS
Data Analytics professional is someone whose
focus is on
collecting
summarizing data
analyzing
to find answers to business questions
DATA ANALYTICS
Data Analyst
Business Analyst
DATA ANALYTICS
Who is the Business Analyst?
What are the Business questions?
DATA ANALYTICS
Business Analyst
Decision Maker Data Analyst
(questions) (solutions)
DATA ANALYTICS
Business questions
What happened?
What will happened?
DATA ANALYTICS
What happened? business case
Which products underperformed?
Which were more profitable?
Did our market share change?
What is our retention rate?
Who are our most valuable customers?
DATA ANALYTICS
What will happen? business case
What is the expected growth?
Who are potential customers?
Most promising product lines?
What market share can we expect?
What new competitors may arise?
DATA ANALYTICS
What will happen? new product
What is the probability of success?
What is the risk of failure?
What is the market acceptance rate?
Will it outperform cu
ent best product?
DATA ANALYTICS
What will happen? investment
What is the expected return?
What is the probability of a loss?
If there is a loss, how large can it be?
What scenarios are possible?
Major external risk in our sector?
DATA ANALYTICS
How does the
Data Analyst
answer these questions?
DATA ANALYTICS
DATA ANALYST – KEY MEASURES YOU SHOULD KNOW
percentages
weighted average
percentile/quantile
absolute, relative change
net, gross change
growth rate
mean, median, variance
ange
covariance
co
elation
distribution
, Rsquared
.
DATA ANALYST – KEY MEASURES YOU SHOULD KNOW
gross change =
net change = XXXXXXXXXX
net change also called relative change
DATA ANALYST – KEY MEASURES YOU SHOULD KNOW
Example: If there is a loss,
how large can it be?
Collect past data
Find distribution of daily losses
Find 95% quantile of daily losses
Find expected loss beyond that quantile (VaR)
DATA ANALYTICS
.
DATA ANALYTICS

95%
Example: Medicine
Business Question
Predict tumor outcome (benign or malign) based on tissue measurements
Collect lab data about variables related to cancer tumors
Build classification model
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
Data Analyst
Searches subsets of variables
to identify malign cancer
Use PCA plot to verify if the PCs are able to identifying cance
Develop a decision boundary
DATA ANALYTICS
DATA ANALYTICS
DATA ANALYTICS
Sampling variation
Sampling e
o
Standard e
o
Significant Statistical difference vs True difference
DATA ANALYST – KEY CONCEPTS YOU SHOULD KNOW
Data Analytics focus is on answering business questions
What happened?
What will happen?
DATA ANALYTICS
What happened?
Descriptive Stats
Summary Tables (crosstabs, pivot tables)
Data visualization
Dashboards
DATA ANALYTICS
What happened? Descriptive Analytics
Descriptive Stats
Summary Tables (crosstabs, pivot tables)
Data visualization
Dashboards
DATA ANALYTICS
What happened? Descriptive Analytics
Descriptive Stats
Summary Tables (crosstabs, pivot tables)
Data visualization
Dashboards
What may happen?
Prediction Models
Classification Models
Clustering methods
DATA ANALYTICS
What happened? Descriptive Analytics
Descriptive Stats
Summary Tables (crosstabs, pivot tables)
Data visualization
Dashboards
What may happen? Predictive Analytics
Prediction Models
Classification Models
Clustering methods
DATA ANALYTICS
What happened? Descriptive Analytics
Descriptive Stats
Summary Tables (crosstabs, pivot tables)
Data visualization
Dashboards
Why did it happen?
What may happen? Predictive Analytics
Prediction Models
Classification Models
Clustering methods
DATA ANALYTICS
What happened? Descriptive Analytics
Descriptive Stats
Summary Tables (crosstabs, pivot tables)
Data visualization
Dashboards
Why did it happen? Diagnostic Analytics
What may happen? Predictive Analytics
Prediction Models
Classification Models
Clustering methods
DATA ANALYTICS
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
DATA ANALYTICS
Descriptive Analytics
Diagnostic Analytics
Predictive Analytics
Prescriptive Analytics
DATA ANALYTICS
Past performance
Historical data
Today
observe & predict
Future performance
esults
ANALYTICS
What happened? XXXXXXXXXXWhat may happen?
Past performance
Historical data
Today
observe & predict
Future performance
esults
ANALYTICS
What happened? XXXXXXXXXXWhat may happen?
Describe/summarize data XXXXXXXXXXscenarios
Past performance
Historical data
Today
observe & predict
Future performance
esults
ANALYTICS
What happened? XXXXXXXXXXWhat may happen?
Describe/summarize data XXXXXXXXXXscenarios
Descriptive Stats
Barplots, scatterplots, boxplots XXXXXXXXXXPrediction Models
Line charts, Histograms XXXXXXXXXXprediction models
Averages, std. deviations XXXXXXXXXXclassification models
co
elations