当前位置: 首页 > news >正文

线性回归-入门案例

  • 使用公开的房价数据集进行预测,数据包含8个特征1个目标值
  • 特征最多使用2次幂

代码示例

import numpy as np
import pandas as pd
from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures, StandardScaler# 1. 读取公开数据集
data = fetch_california_housing()
print('california 房价数据简介:')
print(data.DESCR)  # 20640行,8个特征,目标值是房价
np.set_printoptions(threshold=1000)
print('california 房价特征集:')
pd.set_option('display.max_columns', None)
pd.set_option('display.width', 1000)
X = pd.DataFrame(data.data, columns=data.feature_names)  # 获取特征,封装成 DataFrame
print(X)
print('california 房价目标值:')
y = data.target  # 获取目标值,每一行特征对应的房价,单位是10w美元
print(y)# 2. 切分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42
)# 3. 建立多项式回归 Pipeline 包含特征标准化、特征多项式扩展、线性回归
model = Pipeline([("scaler", StandardScaler()),  # 均值0,方差1("poly", PolynomialFeatures(degree=2, include_bias=False)),  # 每一个特征最多2次幂("linear", LinearRegression())  # 线性回归
])# 4. 拟合模型
model.fit(X_train, y_train)# 5. 预测
y_pred = model.predict(X_test)# 6. 评估
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)print(f"均方误差 MSE: {mse:.4f}")
print(f"决定系数 R²: {r2:.4f}")# 7. 查看生成的多项式特征
poly_feature_names = model.named_steps["poly"].get_feature_names_out(X.columns)
print("多项式特征:")
print(poly_feature_names)  # 8(原特征)+8(平方)+28(交叉)=44
# 8. 查看生成的多项式参数
linear = model.named_steps['linear']
print("多项式参数:")
print(linear.coef_)  # 参数也是44个
print(linear.intercept_)

输出结果

california 房价数据简介:
.. _california_housing_dataset:California Housing dataset
--------------------------**Data Set Characteristics:**:Number of Instances: 20640:Number of Attributes: 8 numeric, predictive attributes and the target:Attribute Information:- MedInc        median income in block group- HouseAge      median house age in block group- AveRooms      average number of rooms per household- AveBedrms     average number of bedrooms per household- Population    block group population- AveOccup      average number of household members- Latitude      block group latitude- Longitude     block group longitude:Missing Attribute Values: NoneThis dataset was obtained from the StatLib repository.
https://www.dcc.fc.up.pt/~ltorgo/Regression/cal_housing.htmlThe target variable is the median house value for California districts,
expressed in hundreds of thousands of dollars ($100,000).This dataset was derived from the 1990 U.S. census, using one row per census
block group. A block group is the smallest geographical unit for which the U.S.
Census Bureau publishes sample data (a block group typically has a population
of 600 to 3,000 people).A household is a group of people residing within a home. Since the average
number of rooms and bedrooms in this dataset are provided per household, these
columns may take surprisingly large values for block groups with few households
and many empty houses, such as vacation resorts.It can be downloaded/loaded using the
:func:`sklearn.datasets.fetch_california_housing` function... rubric:: References- Pace, R. Kelley and Ronald Barry, Sparse Spatial Autoregressions,Statistics and Probability Letters, 33:291-297, 1997.california 房价特征集:MedInc  HouseAge  AveRooms  AveBedrms  Population  AveOccup  Latitude  Longitude
0      8.3252      41.0  6.984127   1.023810       322.0  2.555556     37.88    -122.23
1      8.3014      21.0  6.238137   0.971880      2401.0  2.109842     37.86    -122.22
2      7.2574      52.0  8.288136   1.073446       496.0  2.802260     37.85    -122.24
3      5.6431      52.0  5.817352   1.073059       558.0  2.547945     37.85    -122.25
4      3.8462      52.0  6.281853   1.081081       565.0  2.181467     37.85    -122.25
...       ...       ...       ...        ...         ...       ...       ...        ...
20635  1.5603      25.0  5.045455   1.133333       845.0  2.560606     39.48    -121.09
20636  2.5568      18.0  6.114035   1.315789       356.0  3.122807     39.49    -121.21
20637  1.7000      17.0  5.205543   1.120092      1007.0  2.325635     39.43    -121.22
20638  1.8672      18.0  5.329513   1.171920       741.0  2.123209     39.43    -121.32
20639  2.3886      16.0  5.254717   1.162264      1387.0  2.616981     39.37    -121.24[20640 rows x 8 columns]
california 房价目标值:
[4.526 3.585 3.521 ... 0.923 0.847 0.894]
均方误差 MSE: 0.4643
决定系数 R²: 0.6457
多项式特征:
['MedInc' 'HouseAge' 'AveRooms' 'AveBedrms' 'Population' 'AveOccup''Latitude' 'Longitude' 'MedInc^2' 'MedInc HouseAge' 'MedInc AveRooms''MedInc AveBedrms' 'MedInc Population' 'MedInc AveOccup''MedInc Latitude' 'MedInc Longitude' 'HouseAge^2' 'HouseAge AveRooms''HouseAge AveBedrms' 'HouseAge Population' 'HouseAge AveOccup''HouseAge Latitude' 'HouseAge Longitude' 'AveRooms^2''AveRooms AveBedrms' 'AveRooms Population' 'AveRooms AveOccup''AveRooms Latitude' 'AveRooms Longitude' 'AveBedrms^2''AveBedrms Population' 'AveBedrms AveOccup' 'AveBedrms Latitude''AveBedrms Longitude' 'Population^2' 'Population AveOccup''Population Latitude' 'Population Longitude' 'AveOccup^2''AveOccup Latitude' 'AveOccup Longitude' 'Latitude^2''Latitude Longitude' 'Longitude^2']
多项式参数:
[ 0.93594011  0.13205802 -0.38759869  0.53020674  0.04051346 -1.78126342-1.27267893 -1.1676299  -0.11222558  0.03784584  0.17978116 -0.12015160.11142996 -0.09883978 -0.66721635 -0.58616928  0.0332914  -0.016246720.05234485  0.0360252  -0.27866746 -0.2767792  -0.25281254  0.06040245-0.10958604 -0.15473981  0.57792376  0.54353082  0.47907069  0.049544820.24209969 -0.40169311 -0.48876332 -0.4228783   0.00195178  0.323615260.03280047  0.01523969  0.00769438  0.50676749  0.36713809  0.26320960.4351273   0.15301617]
1.956590491804413
http://www.wxhsa.cn/company.asp?id=7951

相关文章:

  • XXL-JOB(3)
  • ClickHouse 表引擎深度解析:ReplacingMergeTree、PARTITION、PRIMARY KEY、ORDER BY 详解 - 若
  • UOS统信服务器操作系统V20(1070)安装mysql8.4.5(建议安装glibc2.28版本)
  • web5(phps源码泄露)
  • web3(自带网络工具包查看数据)
  • web17(备份的sql文件泄露)
  • web11(通过Dns检查查询Flag)
  • ctfshow_web11
  • ctfshow_web13
  • ctfshow_web9
  • 锁屏界面无法通过任意键弹出开机密码
  • 应急响应-日志分析 - voasem
  • ctfshow web 10
  • SMA的射频连接器
  • 什么是Elasticsearch?它与其他搜索引擎相比有什么优势?
  • pdf.js-2.3.0国内下载地址
  • opencv学习记录2
  • get请求图片文件转为base64编码
  • BMS与威纶通人机界面通信问题
  • Blazor全栈是个陷阱
  • 大型语言模型安全实践:Copilot安全防护经验总结
  • 一些编程语言的发展史
  • mysql生成uuid,3种实用方法详解
  • vmware ubuntu共享文件夹
  • 【10章】n8n+AI工作流:从入门到企业级AI应用实战
  • CodeGPT AI代码狂潮来袭!个人完全免费使用谷歌Gemini大模型 超越DeepSeek几乎是地表最强
  • svg和canvas的区别
  • 固态电池革命:我们离“续航焦虑终结者”还有多远?
  • 心得
  • Android 安卓 困难处理记录 腾讯IM和厂商离线推送难题 点击离线推送无法唤醒APP启动页但某些Service服务和Application被启动