FastFM简介
FastFM的主要特点是将是将因子分解封装成scikit-learn API接口,核心代码使用C编写,性能有一定的保障。fastFM主要提供了回归、分类、排序三种问题的解决方法。其中对于优化器,有als,mcmc,sgd三种,Loss function则对应于所需要解决的问题而不同。
Task | Solver | Loss |
Regression | als, mcmc, sgd | Square Loss |
Classification | als, mcmc, sgd | Probit(Map), Probit, Sigmoid |
Ranking | sgd | BPR |
如何选取不同的优化器?
- ALS:
- 优点:预测速度快,相比SGD,所需要的参数更少
- 缺点:需要手动正则化
- SGD:
- 优点:预测速度快,可以在大数据量的基础上迭代
- 缺点:必须手动指定正则化,超参数,step_size步长需要自己指定
- MCMC:
- 优点:需要的超参数很少,一般只需要指定迭代次数、初始化方差、以及rank,自动正则化
- 缺点:在训练的过程中需要预测测试集
FastFM的使用
1、使用ALS进行回归预测
from fastFM import als from fastFM.datasets import make_user_item_regression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score import numpy as np from matplotlib import pyplot as plt X, y, coef = make_user_item_regression(label_stdev=.4) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42) n_iter = 20 step_size = 1 l2_reg_w = 0 l2_reg_V = 0 fm = als.FMRegression(n_iter=0, l2_reg_w=0.1, l2_reg_V=0.1, rank=4) # Allocates and initalizes the model parameter. fm.fit(X_train, y_train) rmse_train = [] rmse_test = [] r2_score_train = [] r2_score_test = [] for i in range(1, n_iter): fm.fit(X_train, y_train, n_more_iter=step_size) y_pred = fm.predict(X_test) rmse_train.append(np.sqrt(mean_squared_error(fm.predict(X_train), y_train))) rmse_test.append(np.sqrt(mean_squared_error(fm.predict(X_test), y_test))) r2_score_train.append(r2_score(fm.predict(X_train), y_train)) r2_score_test.append(r2_score(fm.predict(X_test), y_test)) fig, axes = plt.subplots(ncols=2, figsize=(15, 4)) x = np.arange(1, n_iter) * step_size with plt.style.context('fivethirtyeight'): axes[0].plot(x, rmse_train, label='RMSE-train', color='r', ls="--") axes[0].plot(x, rmse_test, label='RMSE-test', color='r') axes[1].plot(x, r2_score_train, label='R^2-train', color='b', ls="--") axes[1].plot(x, r2_score_test, label='R^2-test', color='b') axes[0].set_ylabel('RMSE', color='r') axes[1].set_ylabel('R^2', color='b') axes[0].legend() axes[1].legend()
2、使用MCMC进行回归预测
from fastFM.datasets import make_user_item_regression from sklearn.model_selection import train_test_split from sklearn.metrics import mean_squared_error, r2_score import numpy as np from matplotlib import pyplot as plt from fastFM import mcmc n_iter = 100 step_size = 10 seed = 123 rank = 3 X, y, coef = make_user_item_regression(label_stdev=.4) X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.33) fm = mcmc.FMRegression(n_iter=0, rank=rank, random_state=seed) # Allocates and initalizes the model and hyper parameter. fm.fit_predict(X_train, y_train, X_test) rmse_test = [] rmse_new = [] hyper_param = np.zeros((n_iter -1, 3 + 2 * rank), dtype=np.float64) for nr, i in enumerate(range(1, n_iter)): fm.random_state = i * seed y_pred = fm.fit_predict(X_train, y_train, X_test, n_more_iter=step_size) rmse_test.append(np.sqrt(mean_squared_error(y_pred, y_test))) hyper_param[nr, :] = fm.hyper_param_ values = np.arange(1, n_iter) x = values * step_size burn_in = 5 x = x[burn_in:] fig, axes = plt.subplots(nrows=2, ncols=2, sharex=True, figsize=(15, 8)) axes[0, 0].plot(x, rmse_test[burn_in:], label='test rmse', color="r") axes[0, 0].legend() axes[0, 1].plot(x, hyper_param[burn_in:,0], label='alpha', color="b") axes[0, 1].legend() axes[1, 0].plot(x, hyper_param[burn_in:,1], label='lambda_w', color="g") axes[1, 0].legend() axes[1, 1].plot(x, hyper_param[burn_in:,3], label='mu_w', color="g") axes[1, 1].legend()
注意事项:fastfm的特征需要csr格式,如果是panas dataframe 需要先进行转换。
X_train = scipy.sparse.csr_matrix(X_train.values) X_test = scipy.sparse.csr_matrix(X_test.values)
参考链接:
网友评论已有0条评论, 我也要评论