Introduction xxiii
Chapter 1 Introduction to Machine Learning 1
What Is Machine Learning? 2
What Problems Will Machine Learning Be Solving in This Book? 3
Classification 4
Regression 4
Clustering 5
Types of Machine Learning Algorithms 5
Supervised Learning 5
Unsupervised Learning 7
Getting the Tools 8
Obtaining Anaconda 8
Installing Anaconda 9
Running Jupyter Notebook for Mac 9
Running Jupyter Notebook for Windows 10
Creating a New Notebook 11
Naming the Notebook 12
Adding and Removing Cells 13
Running a Cell 14
Restarting the Kernel 16
Exporting Your Notebook 16
Getting Help 17
Chapter 2 Extending Python Using NumPy 19
What Is NumPy? 19
Creating NumPy Arrays 20
Array Indexing 22
Boolean Indexing 22
Slicing Arrays 23
NumPy Slice Is a Reference 25
Reshaping Arrays 26
Array Math 27
Dot Product 29
Matrix 30
Cumulative Sum 31
NumPy Sorting 32
Array Assignment 34
Copying by Reference 34
Copying by View (Shallow Copy) 36
Copying by Value (Deep Copy) 37
Chapter 3 Manipulating Tabular Data Using Pandas 39
What Is Pandas? 39
Pandas Series 40
Creating a Series Using a Specified Index 41
Accessing Elements in a Series 41
Specifying a Datetime Range as the Index of a Series 42
Date Ranges 43
Pandas DataFrame 45
Creating a DataFrame 45
Specifying the Index in a DataFrame 46
Generating Descriptive Statistics on the DataFrame 47
Extracting from DataFrames 49
Selecting the First and Last Five Rows 49
Selecting a Specific Column in a DataFrame 50
Slicing Based on Row Number 50
Slicing Based on Row and Column Numbers 51
Slicing Based on Labels 52
Selecting a Single Cell in a DataFrame 54
Selecting Based on Cell Value 54
Transforming DataFrames 54
Checking to See If a Result Is a DataFrame or Series 55
Sorting Data in a DataFrame 55
Sorting by Index 55
Sorting by Value 56
Applying Functions to a DataFrame 57
Adding and Removing Rows and Columns in a DataFrame 60
Adding a Column 61
Removing Rows 61
Removing Columns 62
Generating a Crosstab 63
Chapter 4 Data Visualization Using matplotlib 67
What Is matplotlib? 67
Plotting Line Charts 68
Adding Title and Labels 69
Styling 69
Plotting Multiple Lines in the Same Chart 71
Adding a Legend 72
Plotting Bar Charts 73
Adding Another Bar to the Chart 74
Changing the Tick Marks 75
Plotting Pie Charts 77
Exploding the Slices 78
Displaying Custom Colors 79
Rotating the Pie Chart 80
Displaying a Legend 81
Saving the Chart 82
Plotting Scatter Plots 83
Combining Plots 83
Subplots 84
Plotting Using Seaborn 85
Displaying Categorical Plots 86
Displaying Lmplots 88
Displaying Swarmplots 90
Chapter 5 Getting Started with Scikit-learn for Machine Learning 93
Introduction to Scikit-learn 93
Getting Datasets 94
Using the Scikit-learn Dataset 94
Using the Kaggle Dataset 97
Using the UCI (University of California, Irvine) Machine Learning Repository 97
Generating Your Own Dataset 98
Linearly Distributed Dataset 98
Clustered Dataset 98
Clustered Dataset Distributed in Circular Fashion 100
Getting Started with Scikit-learn 100
Using the LinearRegression Class for Fitting the Model 101
Making Predictions 102
Plotting the Linear Regression Line 102
Getting the Gradient and Intercept of the Linear Regression Line 103
Examining the Performance of the Model by Calculating the Residual Sum of Squares 104
Evaluating the Model Using a Test Dataset 105
Persisting the Model 106
Data Cleansing 107
Cleaning Rows with NaNs 108
Replacing NaN with the Mean of the Column 109
Removing Rows 109
Removing Duplicate Rows 110
Normalizing Columns 112
Removing Outliers 113
Tukey Fences 113
Z-Score 116
Chapter 6 Supervised LearningLinear Regression 119
Types of Linear Regression 119
Linear Regression 120
Using the Boston Dataset 120
Data Cleansing 125
Feature Selection 126
Multiple Regression 128
Training the Model 131
Getting the Intercept and Coefficients 133
Plotting the 3D Hyperplane 133
Polynomial Regression 135
Formula for Polynomial Regression 138
Polynomial Regression in Scikit-learn 138
Understanding Bias and Variance 141
Using Polynomial Multiple Regression on the Boston Dataset 144
Plotting the 3D Hyperplane 146
Chapter 7 Supervised LearningClassification Using Logistic Regression 151
What Is Logistic Regression? 151
Understanding Odds 153
Logit Function 153
Sigmoid Curve 154
Using the Breast Cancer Wisconsin (Diagnostic) Data Set 156
Examining the Relationship Between Features 156
Plotting the Features in 2D 157
Plotting in 3D 158
Training Using One Feature 161
Finding the Intercept and Coefficient 162
Plotting the Sigmoid Curve 162
Making Predictions 163
Training the Model Using All Features 164
Testing the Model 166
Getting the Confusion Matrix 166
Computing Accuracy, Recall, Precision, and Other Metrics 168
Receiver Operating Characteristic (ROC) Curve 171
Plotting the ROC and Finding the Area Under the Curve (AUC) 174
Chapter 8 Supervised LearningClassification Using Support Vector Machines 177
What Is a Support Vector Machine? 177
Maximum Separability 178
Support Vectors 179
Formula for the Hyperplane 180
Using Scikit-learn for SVM 181
Plotting the Hyperplane and the Margins 184
Making Predictions 185
Kernel Trick 186
Adding a Third Dimension 187
Plotting the 3D Hyperplane 189
Types of Kernels 191
C 194
Radial Basis Function (RBF) Kernel 196
Gamma 197
Polynomial Kernel 199
Using SVM for Real-Life Problems 200
Chapter 9 Supervised LearningClassification Using K-Nearest Neighbors (KNN) 205
What Is K-Nearest Neighbors? 205
Implementing KNN in Python 206
Plotting the Points 206
Calculating the Distance Between the Points 207
Implementing KNN 208
Making Predictions 209
Visualizing Different Values of K 209
Using Scikit-Learns KNeighborsClassifier Class for KNN 211
Exploring Different Values of K 213
Cross-Validation 216
Parameter-Tuning K 217
Finding the Optimal K 218
Chapter 10 Unsupervised LearningClustering Using K-Means 221
What Is Unsupervised Learning? 221
Unsupervised Learning Using K-Means 222
How Clustering in K-Means Works 222
Implementing K-Means in Python 225
Using K-Means in Scikit-learn 230
Evaluating Cluster Size Using the Silhouette Coefficient 232
Calculating the Silhouette Coefficient 233
Finding the Optimal K 234
Using K-Means to Solve Real-Life Problems 236
Importing the Data 237
Cleaning the Data 237
Plotting the Scatter Plot 238
Clustering Using K-Means 239
Finding the Optimal Size Classes 240
Chapter 11 Using Azure Machine Learning Studio 243
What Is Microsoft Azure Machine Learning Studio? 243
An Example Using the Titanic Experiment 244
Using Microsoft Azure Machine Learning Studio 246
Uploading Your Dataset 247
Creating an Experiment 248
Filtering the Data and Making Fields Categorical 252
Removing the Missing Data 254
Splitting the Data for Training and Testing 254
Training a Model 256
Comparing Against Other Algorithms 258
Evaluating Machine Learning Algorithms 260
Publishing the Learning Model as a Web Service 261
Publishing the Experiment 261
Testing the Web Service 263
Programmatically Accessing the Web Service 263
Chapter 12 Deploying Machine Learning Models 269
Deploying ML 269
Case Study 270
Loading the Data 271
Cleaning the Data 271
Examining the Correlation Between the Features 273
Plotting the Correlation Between Features 274
Evaluating the Algorithms 277
Logistic Regression 277
K-Nearest Neighbors 277
Support Vector Machines 278
Selecting the Best Performing Algorithm 279
Training and Saving the Model 279
Deploying the Model 280
Testing the Model 282
Creating the Client Application to Use the Model 283
Index 285