CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 74.8k
Description
Describe the issue
In TF2, the full-integer quantized models produced by the TFLite Converter can only have float input and output type. This is a blocker for users who require int8 or uint8 input and/or output type.
UPDATE: We now support this workflow.
End-to-End Tutorial: https://colab.sandbox.google.com/github/google-coral/tutorials/blob/master/retrain_classification_ptq_tf2.ipynb
Only TFLite Conversion: Convert TF Models to TFLite Full-Integer models
You can refer to the code here, also given below:
import tensorflow as tf
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
def representative_dataset_gen():
for _ in range(num_calibration_steps):
# Get sample input data as a numpy array in a method of your choosing.
yield [input]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8 # or tf.uint8
converter.inference_output_type = tf.int8 # or tf.uint8
tflite_model = converter.convert()
Only TFLite Inference: Run inference on the TFLite model
Note that the one caveat with integer-only models is this -- you need to manually map (aka quantize) the float inputs to integer inputs during inference. To understand how this can be done -- refer to the equation provided in TensorFlow Lite 8-bit quantization specification document and it's equivalent code in python below:
import numpy as np
import tensorflow as tf
# Input to the TF model are float values in the range [0, 10] and of size (1, 100)
np.random.seed(0)
tf_input = np.random.uniform(low=0, high=10, size=(1, 100)).astype(np.float32)
# Output of the TF model.
tf_output = keras_model.predict(input)
# Output of the TFLite model.
interpreter = tf.lite.Interpreter(model_content=tflite_model)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()[0]
# Manually quantize the input from float to integer
scale, zero_point = input_details['quantization']
tflite_integer_input = tf_input / scale + zero_point
tflite_integer_input = tflite_integer_input.astype(input_details['dtype'])
interpreter.set_tensor(input_details['index'], tflite_integer_input)
interpreter.invoke()
output_details = interpreter.get_output_details()[0]
tflite_integer_output = interpreter.get_tensor(output_details['index'])
# Manually dequantize the output from integer to float
scale, zero_point = output_details['quantization']
tflite_output = tflite_integer_output.astype(np.float32)
tflite_output = (tflite_output - zero_point) * scale
# Verify that the TFLite model's output is approximately (expect some loss in
# accuracy due to quantization) the same as the TF model's output
assert np.allclose(tflite_output, tf_output, atol=1e-04) == True