1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

How to run half precision inference on a TensorRT model, written with TensorRT C++ API?

Discussion in 'Programming/Internet' started by bfra, Sep 12, 2018.

  1. bfra

    bfra Guest

    I'm trying to run half precision inference with a model natively written in TensorRT C++ API (not parsed from other frameworks e.g. caffe, tensorflow); To the best of my knowledge, there is no public working example of this problem; the closest thing I found is the sampleMLP sample code, released with TensorRT, yet the release notes say there is no support for fp16;

    My toy example code can be found in this repo. It contains API-implemented architecture and inference routine, plus the python script I use to convert my dictionary of trained weights to the wtd TensorRT format.

    My toy architecture only consists of one convolution; the goal is to obtain similar results between fp32 and fp16, except for some reasonable loss of precision; the code seems to work with fp32, whereas what I obtain in case of fp16 inferencing are values of totally different orders of magnitude (~1e40); so it looks like I'm doing something wrong during conversions;

    I'd appreciate any help in understanding the problem.



    Login To add answer/comment

Share This Page