聚合国内IT技术精华文章,分享IT技术精华,帮助IT从业人士成长

Python因子分解库:fastFM

2019-10-18 23:15 浏览: 2100132 次 我要评论(0 条) 字号:

FastFM简介

FastFM的主要特点是将是将因子分解封装成scikit-learn API接口,核心代码使用C编写,性能有一定的保障。fastFM主要提供了回归、分类、排序三种问题的解决方法。其中对于优化器,有als,mcmc,sgd三种,Loss function则对应于所需要解决的问题而不同。

Task Solver Loss
Regression als, mcmc, sgd Square Loss
Classification als, mcmc, sgd Probit(Map), Probit, Sigmoid
Ranking sgd BPR

如何选取不同的优化器?

  • ALS:
    • 优点:预测速度快,相比SGD,所需要的参数更少
    • 缺点:需要手动正则化
  • SGD:
    • 优点:预测速度快,可以在大数据量的基础上迭代
    • 缺点:必须手动指定正则化,超参数,step_size步长需要自己指定
  • MCMC:
    • 优点:需要的超参数很少,一般只需要指定迭代次数、初始化方差、以及rank,自动正则化
    • 缺点:在训练的过程中需要预测测试集

FastFM的使用

1、使用ALS进行回归预测

from fastFM import als
from fastFM.datasets import make_user_item_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
from matplotlib import pyplot as plt

X, y, coef = make_user_item_regression(label_stdev=.4)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

n_iter = 20
step_size = 1
l2_reg_w = 0
l2_reg_V = 0

fm = als.FMRegression(n_iter=0, l2_reg_w=0.1, l2_reg_V=0.1, rank=4)
# Allocates and initalizes the model parameter.
fm.fit(X_train, y_train)

rmse_train = []
rmse_test = []
r2_score_train = []
r2_score_test = []

for i in range(1, n_iter):
    fm.fit(X_train, y_train, n_more_iter=step_size)
    y_pred = fm.predict(X_test)

    rmse_train.append(np.sqrt(mean_squared_error(fm.predict(X_train), y_train)))
    rmse_test.append(np.sqrt(mean_squared_error(fm.predict(X_test), y_test)))

    r2_score_train.append(r2_score(fm.predict(X_train), y_train))
    r2_score_test.append(r2_score(fm.predict(X_test), y_test))


fig, axes = plt.subplots(ncols=2, figsize=(15, 4))
x = np.arange(1, n_iter) * step_size
with plt.style.context('fivethirtyeight'):
    axes[0].plot(x, rmse_train, label='RMSE-train', color='r', ls="--")
    axes[0].plot(x, rmse_test, label='RMSE-test', color='r')
    axes[1].plot(x, r2_score_train, label='R^2-train', color='b', ls="--")
    axes[1].plot(x, r2_score_test, label='R^2-test', color='b')
axes[0].set_ylabel('RMSE', color='r')
axes[1].set_ylabel('R^2', color='b')
axes[0].legend()
axes[1].legend()

2、使用MCMC进行回归预测

from fastFM.datasets import make_user_item_regression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
import numpy as np
from matplotlib import pyplot as plt
from fastFM import mcmc

n_iter = 100
step_size = 10
seed = 123
rank = 3

X, y, coef = make_user_item_regression(label_stdev=.4)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.33)

fm = mcmc.FMRegression(n_iter=0, rank=rank, random_state=seed)
# Allocates and initalizes the model and hyper parameter.
fm.fit_predict(X_train, y_train, X_test)

rmse_test = []
rmse_new = []
hyper_param = np.zeros((n_iter -1, 3 + 2 * rank), dtype=np.float64)
for nr, i in enumerate(range(1, n_iter)):
    fm.random_state = i * seed
    y_pred = fm.fit_predict(X_train, y_train, X_test, n_more_iter=step_size)
    rmse_test.append(np.sqrt(mean_squared_error(y_pred, y_test)))
    hyper_param[nr, :] = fm.hyper_param_

values = np.arange(1, n_iter)
x = values * step_size
burn_in = 5
x = x[burn_in:]

fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, figsize=(15, 8))

axes[0, 0].plot(x, rmse_test[burn_in:], label='test rmse', color="r")
axes[0, 0].legend()
axes[0, 1].plot(x, hyper_param[burn_in:,0], label='alpha', color="b")
axes[0, 1].legend()
axes[1, 0].plot(x, hyper_param[burn_in:,1], label='lambda_w', color="g")
axes[1, 0].legend()
axes[1, 1].plot(x, hyper_param[burn_in:,3], label='mu_w', color="g")
axes[1, 1].legend()

注意事项:fastfm的特征需要csr格式,如果是panas dataframe 需要先进行转换。

X_train = scipy.sparse.csr_matrix(X_train.values)
X_test = scipy.sparse.csr_matrix(X_test.values)

参考链接:



网友评论已有0条评论, 我也要评论

发表评论

*

* (保密)

Ctrl+Enter 快捷回复