Training a model in the cloud with Modelbit is simple! In this post we'll demonstrate a simple model trained locally and then trained in the cloud on much more data.
Let's start with getting some simple data from which to build our model. First log into Modelbit:
import modelbit
mb = modelbit.login()
On our team, we've agreed on this local training subset to get started. This is 10,000 lines of NBA game stats data. Enough to get started locally! Let's download it:
df = mb.get_dataset("nba games local training subset")
df
GAME_ID | TEAM_ID | TEAM_ABBR | TEAM_CITY | PLAYER_ID | PLAYER_NAME | PLAYER_NICKNAME | START_POSITION | COMMMENTS | MINUTES_PLAYED | FGM | PTS | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 21701043 | 1610612758 | SAC | Sacramento | 1628382 | Justin Jackson | NaN | F | NaN | 19:10 | 1 | 2 |
1 | 21000662 | 1610612759 | SAS | San Antonio | 1938 | Manu Ginobili | NaN | G | NaN | 31:21 | 5 | 24 |
2 | 20901168 | 1610612762 | UTA | Utah | 202178 | Sundiata Gaines | NaN | NaN | NaN | 13:52 | 2 | 6 |
3 | 20600509 | 1610612737 | ATL | Atlanta | 101159 | Dijon Thompson | NaN | NaN | NaN | 5:56 | 2 | 11 |
4 | 20801087 | 1610612745 | HOU | Houston | 200778 | James White | NaN | NaN | NaN | 4:29 | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
9995 | 21700993 | 1610612741 | CHI | Chicago | 203897 | Zach LaVine | NaN | F | NaN | 28:22 | 7 | 21 |
9996 | 21300659 | 1610612742 | DAL | Dallas | 1713 | Vince Carter | NaN | NaN | NaN | 22:46 | 4 | 14 |
9997 | 21100325 | 1610612762 | UTA | Utah | 2248 | Earl Watson | NaN | NaN | NaN | 20:36 | 1 | 3 |
9998 | 21800984 | 1610612743 | DEN | Denver | 1627823 | Juancho Hernangomez | NaN | NaN | NaN | 3:16 | 0 | 4 |
9999 | 20500061 | 1610612759 | SAS | San Antonio | 2225 | Tony Parker | NaN | G | NaN | 41:20 | 11 | 24 |
10000 rows × 12 columns
Let's take a look at this data graphically! We'll plot points (PTS
) vs. baskets (FGM
). As you can see this data is nice and linear:
import matplotlib.pyplot as plt
%matplotlib inline
plt.scatter(df["FGM"], df["PTS"])
<matplotlib.collections.PathCollection at 0x7fc2a0bb28e0>
Let's predict the number of points a player will score given the number of baskets they made! As this dataset is quite linear, a linear regression should do nicely. We'll train a quick and easy one right here in the notebook:
from sklearn.linear_model import LinearRegression
locally_trained_model = LinearRegression()
locally_trained_model.fit(df[["FGM"]].values, df["PTS"])
LinearRegression()
With that done, we can plot the model against our datapoints:
plt.scatter(df["FGM"], df["PTS"])
plt.plot(df["FGM"], locally_trained_model.predict(df[["FGM"]].values), c="g") # Plot the locally-trained model
[<matplotlib.lines.Line2D at 0x7fc280739790>]
Looks good! Now let's use Modelbit to train this model in the cloud.
To get started we'll build a quick Python function that we'll execute in the cloud. This function works just the same as the code we ran locally, but note it pulls down the entire nba games full
dataset. This dataset is quite large -- fit for cloud training!
def trainPointsPredictor():
df = mb.get_dataset("nba games full") # Pull down the full dataset, not just the local sample
l = LinearRegression()
l.fit(df[["FGM"]].values, df["PTS"])
return l # Return the trained model out of the Python function
Note also that our training function returns the trained model out of the Python function. It's ready to train!
We just call mb.train
to train our model in the cloud:
mb.train(trainPointsPredictor)
Sending training job...
Training Job "trainPointsPredictor" will begin shortly.
Now that the training job has begun, if we click through, we can see our job running in Modelbit!
We can see the code that's running, the Python environment it's running in, log lines, version control, and more!
Looks like it took 58 seconds to train and it's already done! By copying the code in "results", we can download the results right out of the cloud. Since we returned our model from the Python function, that's what'll download out of Modelbit:
job = mb.get_training_job("trainPointsPredictor", version=1647648592264)
cloud_trained_model = job.get_result(1647648592502) # returns LinearRegression()
Now that we have the cloud-trained model, let's add it to our plot:
plt.scatter(df["FGM"], df["PTS"])
plt.plot(df["FGM"], locally_trained_model.predict(df[["FGM"]].values), c="g")
plt.plot(df["FGM"], cloud_trained_model.predict(df[["FGM"]].values), c="m") # Plot the cloud-trained model
[<matplotlib.lines.Line2D at 0x7fc288163850>]
Looks like the cloud-trained model, represented as the magenta line, is a bit different from the local model, which is the green line. Good thing we trained in the cloud on the full dataset!
That's all for today! Training jobs are now publicly available in Modelbit. Let us know what you think!