基于机器学习的混凝土抗压强度及利用Docker与FastAPI进行模型部署并形成API
利用Docker与FastAPI部署机器学习模型
构建和部署机器学习模型所设计到的步骤通常可以总结为:模型构建、创建API来提供模型预测、容器化API以及部署到云端. 有关相关环境配置问题,可以阅读基于WSL2+Docker+VScode搭建机器学习(深度学习)开发环境.
这个记事本主要内容如下:
- 使用Scikit-learn构建机器学习模型;
- 使用FastAPI创建REST API来提供模型预测;
- 使用Docker容器化API;
项目目录树:
点击在线访问:利用Docker与FastAPI部署机器学习模型 Jupyter notebook
.ML-利用Docker与FastAPI部署机器学习模型
├── Appendix-files
│ ├── Readme.md
│ └── figures
│ ├── Postman_result.png
│ ├── containerize-app1.png
│ └── model-deployment.png
├── Dockerfile
├── ML-利用Docker与FastAPI部署机器学习模型.ipynb
├── app
│ ├── init.py
│ └── main.py
├── model
│ └── RF_model.pkl
└── requirements.txt
from IPython.display import ImageImage(filename='./Appendix-files/figures/model-deployment.png', width=550)
!pip install -i https://pypi.tuna.tsinghua.edu.cn/simple pandas scikit-learn fastapi uvicorn matplotlib
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: pandas in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (2.2.2)
Requirement already satisfied: scikit-learn in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (1.0.2)
Requirement already satisfied: fastapi in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (0.115.2)
Requirement already satisfied: uvicorn in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (0.32.0)
Requirement already satisfied: matplotlib in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (3.9.2)
Requirement already satisfied: numpy>=1.22.4 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from pandas) (1.26.4)
Requirement already satisfied: python-dateutil>=2.8.2 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from pandas) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from pandas) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from pandas) (2023.3)
Requirement already satisfied: scipy>=1.1.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from scikit-learn) (1.13.1)
Requirement already satisfied: joblib>=0.11 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from scikit-learn) (1.4.2)
Requirement already satisfied: threadpoolctl>=2.0.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from scikit-learn) (3.5.0)
Requirement already satisfied: starlette<0.41.0,>=0.37.2 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from fastapi) (0.40.0)
Requirement already satisfied: pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from fastapi) (2.9.2)
Requirement already satisfied: typing-extensions>=4.8.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from fastapi) (4.11.0)
Requirement already satisfied: click>=7.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from uvicorn) (8.1.7)
Requirement already satisfied: h11>=0.8 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from uvicorn) (0.14.0)
Requirement already satisfied: contourpy>=1.0.1 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (1.3.0)
Requirement already satisfied: cycler>=0.10 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (0.12.1)
Requirement already satisfied: fonttools>=4.22.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (4.54.1)
Requirement already satisfied: kiwisolver>=1.3.1 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (1.4.7)
Requirement already satisfied: packaging>=20.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (24.1)
Requirement already satisfied: pillow>=8 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (11.0.0)
Requirement already satisfied: pyparsing>=2.3.1 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (3.2.0)
Requirement already satisfied: importlib-resources>=3.2.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from matplotlib) (6.4.5)
Requirement already satisfied: colorama in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from click>=7.0->uvicorn) (0.4.6)
Requirement already satisfied: zipp>=3.1.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from importlib-resources>=3.2.0->matplotlib) (3.20.2)
Requirement already satisfied: annotated-types>=0.6.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (0.7.0)
Requirement already satisfied: pydantic-core==2.23.4 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from pydantic!=1.8,!=1.8.1,!=2.0.0,!=2.0.1,!=2.1.0,<3.0.0,>=1.7.4->fastapi) (2.23.4)
Requirement already satisfied: six>=1.5 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from python-dateutil>=2.8.2->pandas) (1.16.0)
Requirement already satisfied: anyio<5,>=3.4.0 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from starlette<0.41.0,>=0.37.2->fastapi) (4.6.2.post1)
Requirement already satisfied: idna>=2.8 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from anyio<5,>=3.4.0->starlette<0.41.0,>=0.37.2->fastapi) (3.10)
Requirement already satisfied: sniffio>=1.1 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from anyio<5,>=3.4.0->starlette<0.41.0,>=0.37.2->fastapi) (1.3.1)
Requirement already satisfied: exceptiongroup>=1.0.2 in c:\users\xiaoyao\.conda\envs\mldeploy\lib\site-packages (from anyio<5,>=3.4.0->starlette<0.41.0,>=0.37.2->fastapi) (1.2.0)
机器学习模型创建
这里以混凝土抗压强度预测为例,创建一个机器学习模型.
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
%pwd
'd:\\Code_folder\\machine_learning\\Useful-Python-Scripts\\Scripts_folder\\ML-利用Docker与FastAPI部署机器学习模型'
# 数据读取.
df = pd.read_csv('../../Datasets/混凝土抗压强度预测/concrete_data.csv')
print("DF shape:", df.shape)
df.tail()
DF shape: (1030, 9)
Cement | Blast Furnace Slag | Fly Ash | Water | Superplasticizer | Coarse Aggregate | Fine Aggregate | Age | Strength | |
---|---|---|---|---|---|---|---|---|---|
1025 | 276.4 | 116.0 | 90.3 | 179.6 | 8.9 | 870.1 | 768.3 | 28 | 44.28 |
1026 | 322.2 | 0.0 | 115.6 | 196.0 | 10.4 | 817.9 | 813.4 | 28 | 31.18 |
1027 | 148.5 | 139.4 | 108.6 | 192.7 | 6.1 | 892.4 | 780.0 | 28 | 23.70 |
1028 | 159.1 | 186.7 | 0.0 | 175.6 | 11.3 | 989.6 | 788.9 | 28 | 32.77 |
1029 | 260.9 | 100.5 | 78.3 | 200.6 | 8.6 | 864.5 | 761.5 | 28 | 32.40 |
df.describe().T
count | mean | std | min | 25% | 50% | 75% | max | |
---|---|---|---|---|---|---|---|---|
Cement | 1030.0 | 281.167864 | 104.506364 | 102.00 | 192.375 | 272.900 | 350.000 | 540.0 |
Blast Furnace Slag | 1030.0 | 73.895825 | 86.279342 | 0.00 | 0.000 | 22.000 | 142.950 | 359.4 |
Fly Ash | 1030.0 | 54.188350 | 63.997004 | 0.00 | 0.000 | 0.000 | 118.300 | 200.1 |
Water | 1030.0 | 181.567282 | 21.354219 | 121.80 | 164.900 | 185.000 | 192.000 | 247.0 |
Superplasticizer | 1030.0 | 6.204660 | 5.973841 | 0.00 | 0.000 | 6.400 | 10.200 | 32.2 |
Coarse Aggregate | 1030.0 | 972.918932 | 77.753954 | 801.00 | 932.000 | 968.000 | 1029.400 | 1145.0 |
Fine Aggregate | 1030.0 | 773.580485 | 80.175980 | 594.00 | 730.950 | 779.500 | 824.000 | 992.6 |
Age | 1030.0 | 45.662136 | 63.169912 | 1.00 | 7.000 | 28.000 | 56.000 | 365.0 |
Strength | 1030.0 | 35.817961 | 16.705742 | 2.33 | 23.710 | 34.445 | 46.135 | 82.6 |
# 打印所有的列名.
print(list(df.columns))
['Cement', 'Blast Furnace Slag', 'Fly Ash', 'Water', 'Superplasticizer', 'Coarse Aggregate', 'Fine Aggregate', 'Age', 'Strength']
# 避免字段中的空格,以防各种问题.
df.columns = ['Cement', 'Blast_Furnace_Slag', 'Fly_Ash', 'Water', 'Superplasticizer', 'Coarse_Aggregate', 'Fine_Aggregate', 'Age', 'Strength']
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1030 entries, 0 to 1029
Data columns (total 9 columns):# Column Non-Null Count Dtype
--- ------ -------------- ----- 0 Cement 1030 non-null float641 Blast_Furnace_Slag 1030 non-null float642 Fly_Ash 1030 non-null float643 Water 1030 non-null float644 Superplasticizer 1030 non-null float645 Coarse_Aggregate 1030 non-null float646 Fine_Aggregate 1030 non-null float647 Age 1030 non-null int64 8 Strength 1030 non-null float64
dtypes: float64(8), int64(1)
memory usage: 72.5 KB
# 异常值处理.
def remove_outliers(df, col_name):Q1 = df[col_name].quantile(0.25)Q3 = df[col_name].quantile(0.75)IQR = Q3 - Q1df[col_name] = df[col_name].apply(lambda x: Q1-1.5*IQR if x < (Q1-1.5*IQR) else (Q3+1.5*IQR if x > (Q3+1.5*IQR) else x))return df
for col in df.select_dtypes(exclude='object').columns[:-1]:df = remove_outliers(df, col)
# 训练集与测试集划分.
from sklearn.model_selection import train_test_splitX = df.iloc[:, :-1].values
y = df.iloc[:, -1].valuesX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=1)
# 标准化处理.
from sklearn.preprocessing import StandardScalers = StandardScaler()
X_train = s.fit_transform(X_train)
X_test = s.transform(X_test)
# 模型训练与评估.
from sklearn.metrics import r2_score, mean_absolute_errordef model_train(model, X_train, X_test, y_train, y_test):model.fit(X_train, y_train)y_pred = model.predict(X_test)print("R2 Score:", r2_score(y_test, y_pred))print("Mean Absolute Error:", mean_absolute_error(y_test, y_pred))# print("Root Mean Squared Error:", root_mean_squared_error(y_test, y_pred))return model
# 线性回归.
from sklearn.linear_model import LinearRegressionmodel_train(LinearRegression(), X_train, X_test, y_train, y_test)
R2 Score: 0.6578667869972392
Mean Absolute Error: 7.385149849022975LinearRegression()
# 随机森林.
from sklearn.ensemble import RandomForestRegressor
RF_model = model_train(RandomForestRegressor(), X_train, X_test, y_train, y_test)
R2 Score: 0.8904714777790299
Mean Absolute Error: 3.6679456270226494
模型持久化
import pickle
import os# 创建一个目录用于存储模型.
os.makedirs('./model', exist_ok=True)# 模型保存.
with open('./model/RF_model.pkl', 'wb') as f:pickle.dump(RF_model, f)print('模型保存成功.')
模型保存成功.
创建FastAPI应用
# 已单独存为main.py文件.
from fastapi import FastAPI
from pydantic import BaseModel
import pickle
import os# 定义输入样式. # df.columns = ['Cement', 'Blast_Furnace_Slag', 'Fly_Ash', 'Water', 'Superplasticizer', 'Coarse_Aggregate', 'Fine_Aggregate', 'Age', 'Strength']
class InputData(BaseModel):Cement: float # 水泥含量,类型为浮点数Blast_Furnace_Slag: float # 高炉渣含量,类型为浮点数Fly_Ash: float # 粉煤灰含量,类型为浮点数Water: float # 水含量,类型为浮点数Superplasticizer: float # 超级塑化剂含量,类型为浮点数Coarse_Aggregate: float # 粗骨料含量,类型为浮点数Fine_Aggregate: float # 细骨料含量,类型为浮点数Age: int # 混凝土龄期,类型为整数# 数据输入示例.
"""
input_data = InputData(Cement=300.0,Blast_Furnace_Slag=100.0,Fly_Ash=50.0,Water=200.0,Superplasticizer=5.0,Coarse_Aggregate=1000.0,Fine_Aggregate=500.0,Age=28
)
"""# 初始化一个FastAPI app.
app = FastAPI(title='Concrete Compress Strength Prediction API')# 加载训练好的机器学习模型文件.
model_path = os.path.join("./model", "RF_model.pkl")
with open(model_path, 'rb') as f:model = pickle.load(f)@app.post("/predict")
def predict(data:InputData):# 用于预测的输入数据.input_features = [[data.Cement, data.Blast_Furnace_Slag, data.Fly_Ash, data.Water, data.Superplasticizer, data.Coarse_Aggregate, data.Fine_Aggregate, data.Age]]# 模型预测.prediction = model.predict(input_features)# 预测结果返回.return {"Predicted_Compress_Strength":prediction[0]}
使用docker进行容器化
# 创建dockerfile.# 使用Python 3.9作为基础镜像.
FROM python:3.9-slim# 创建并设置一个工作目录.
RUN mkdir -p /CodeWORKDIR /Code# 将项目requirements.txt复制到工作目录.
COPY requirements.txt /Code/requirements.txt# 安装工具包.
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple --no-cache-dir --upgrade -r /Code/requirements.txt# 复制app目录到工作目录.
COPY ./app /Code/app# 复制模型文件路径到工作目录.
COPY ./model /Code/model# 暴露80端口.
EXPOSE 80# 启动命令.
CMD [ "uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "80"]
创建Docker镜像
Image("./Appendix-files/figures/containerize-app1.png", width=550)
$ docker build -t concrete-compress-prediction-api .$ docker run -p 80:80 concrete-compress-prediction-api
访问
可以使用postman进行接口调试,也可以在bash中执行下面的命令:
Image("./Appendix-files/figures/Postman_result.png", width=550)
curl -X 'POST' \'http://127.0.0.1:80/predict' \-H 'Content-Type: application/json' \-d '{"Cement"=300.0,"Blast_Furnace_Slag"=100.0,"Fly_Ash"=50.0,"Water"=200.0,"Superplasticizer"=5.0,"Coarse_Aggregate"=1000.0,"Fine_Aggregate"=500.0,"Age"=28
}'