Serving a Machine Learning Model via a Web Application

Machine learning is awesome and it is showing up everywhere for good reason. Most companies that I interact with are at least kicking around the idea of using it in their business and a smaller set has taken the plunge and hired engineers with the goal of furthering their business with machine learning.

However, in my experience, the story goes that the new hire in charge of the new initiative is likely a data scientist. They have no idea how to write any code on your stack and likely will not ever do so. Instead, they will create amazing predictions using R libraries, Tensorflow, or PyTorch and impress the executives.

The execs will eat it up and ask how quickly the code can get into production. This is where it falls apart. Your engineers have no idea what the data scientist made or how their 1000 line R file can ever get served from their Rails app and the data scientist is just going to keep tuning the model while everyone else struggled because they don’t know or care to know Rails or any other web framework.

So what do you do? Try to find someone that knows both? Developer turned data scientist or data scientist turned developer? Either way, I am sure CyberCoders would be happy to find one for you after three months and a $60,000 finders fee.

Do you dump the 50 binary files generated by the epochs on the developers and just tell them to figure it out? That is a pretty good way to destroy your road map.

The solution that I see most often implemented is to take the machine learning as inspiration and simply write imperative code to get as close as possible. Often times this is some query that runs at a given interval and outputs something close to what the model predicts. To be clear, this is taking a team of engineers to reverse engineer code that already works to develop a suboptimal solution.

The solution

Save the model into some structure at the completion of training. HDF5 is a cake walk, but there are plenty of options to achieve the same goal. In fact, there are trained models on plenty of datasets so learning how to load these models will let you access other models trained on a wide variety of datasets. Let’s take a look at how to do this using just a few lines of code and Tensorflow Keras.

Final code

Please check out the solution here and then read this post to better understand what is going on.

Training the model

To save a ton of time, lets simply use the model we trained and saved in this post. The last line of that file will save a file called saved_model.h5 to the same directory. This is the model that we are going to load in the below code.

Import dependencies

import tensorflow as tf
tf.enable_eager_execution()
from tensorflow import keras
from keras.models import load_model
import functools

Load the model into memory

model = keras.models.load_model('saved_model.h5')

Query the model

The below code is almost exactly the same as the hello_world function that we wrote in the last post. The only difference here is we need the maps that convert the vectorized integers to characters (idx2char) and the characters back to integers (char2idx). This is how we access our vocabulary. You can see that I simply stubbed it, but you could also create it from the dataset if you have access to the dataset.

def hello_world(model, start_string):
  char2idx = {' ': 0, 'H': 1, 'W': 2, 'd': 3, 'e': 4, 'l': 5, 'o': 6, 'r': 7}
  idx2char = [' ', 'H', 'W', 'd', 'e', 'l', 'o', 'r']
  # Number of characters to generate
  num_generate = 10

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)
      predictions = predictions / temperature
      predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()
      print(predicted_id)
      input_eval = tf.expand_dims([predicted_id], 0)
      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

Call the function

print(hello_world(model, start_string="H"))

A simple call to that function and you should see Hello World or at least something close.

You did it! You officially loaded a pre-trained model into memory and queried it. Now for the even easier part.

Serving the results via a web application

To keep things easy, we are going to query this model and serve the data via a Flask application, but Tensorflow has bindings in both Go and JavaScript so you have your choice. If you are on something like a Rails app, I highly suggest setting up a micro-service.

Setup the application

from flask import Flask
from flask import render_template
from flask import jsonify
from flask import request
import tensorflow as tf
tf.enable_eager_execution()
from tensorflow import keras
from keras.models import load_model
import functools

app = Flask(__name__)

Load model into memory

This should look familiar

model = keras.models.load_model('saved_model.h5')

Setup endpoints

You can see below that we have one endpoint to serve the template and a second POST endpoint that will take a string, query the model and respond with the result.

@app.route('/')
def index():
  return render_template('index.html')

@app.route('/query', methods = ['POST'])
def query():
  return jsonify(hello_world(model, start_string=request.json['query']))

Paste in exact same hello_world function to query model

def hello_world(model, start_string):
  char2idx = {' ': 0, 'H': 1, 'W': 2, 'd': 3, 'e': 4, 'l': 5, 'o': 6, 'r': 7}
  idx2char = [' ', 'H', 'W', 'd', 'e', 'l', 'o', 'r']
  # Number of characters to generate
  num_generate = 10

  # Converting our start string to numbers (vectorizing)
  input_eval = [char2idx[s] for s in start_string]
  input_eval = tf.expand_dims(input_eval, 0)

  text_generated = []
  temperature = 1.0

  # Here batch size == 1
  model.reset_states()
  for i in range(num_generate):
      predictions = model(input_eval)
      # remove the batch dimension
      predictions = tf.squeeze(predictions, 0)
      predictions = predictions / temperature

      # predicted_id = tf.multinomial(tf.exp(predictions), num_samples=1)[-1,0].eval()
      predicted_id = tf.multinomial(predictions, num_samples=1)[-1,0].numpy()
      input_eval = tf.expand_dims([predicted_id], 0)
      text_generated.append(idx2char[predicted_id])

  return (start_string + ''.join(text_generated))

That is it. Seriously. Write whatever front end you want

If you just want a quick one to test, here is a gross one with an inline react component.

<!DOCTYPE html>
<html>
  <head lang="en">
    <meta charset="UTF-8">
    <title>Flask React</title>
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <!-- styles -->
  </head>
  <body>
    <div class="container">
      <h1>Flask React</h1>
      <br>
      <div id="content"></div>
    </div>
    <!-- scripts -->
    <script src="https://cdnjs.cloudflare.com/ajax/libs/react/15.1.0/react.min.js"></script>
    <script src="https://cdnjs.cloudflare.com/ajax/libs/react/15.1.0/react-dom.min.js"></script>
    <script src="http://cdnjs.cloudflare.com/ajax/libs/react/0.13.3/JSXTransformer.js"></script>
    <script type="text/jsx">

      /*** @jsx React.DOM */

      class NameForm extends React.Component {
        constructor(props) {
          super(props);
          this.state = {value: '', response: ''};

          this.handleChange = this.handleChange.bind(this);
          this.handleSubmit = this.handleSubmit.bind(this);
        }

        handleChange(event) {
          this.setState({value: event.target.value});
        }

        handleSubmit(event) {
          fetch('/query', {
            method: 'POST',
            headers: {
              'Accept': 'application/json',
              'Content-Type': 'application/json',
            },
            body: JSON.stringify({query: this.state.value})
          }).then(function(response) {
            return response.json();
          }).then((response) => {
            console.log(response);
            this.setState({response: response});
            return response;
          })
          event.preventDefault();
        }

        render() {
          return (
            <form onSubmit={this.handleSubmit}>
              <label>
                Name:
                <input type="text" value={this.state.value} onChange={this.handleChange} />
              </label>
              <div>{this.state.response}</div>
              <input type="submit" value="Submit" />
            </form>
          );
        }
      }

      ReactDOM.render(
        React.createElement(NameForm, null),
        document.getElementById('content')
      );

    </script>
  </body>
</html>

You did it! Hopefully now your engineers and data scientists can now live in perfect harmony.