Development

Samsung Neural SDK provides a sample benchmarking application code to execute a model (neural network model) for image classification. This section describes how to execute a model using this sample application code on a Samsung device. Preparation of the model using offline tools are also covered which is an optional step prior to carrying out the execution on the device.

Step 1: Model processing for CPU/GPU

Neural SDK provides several tools for the benefit of model developer. These model tools include a growing set of optimizations like folding of batch normalization layer, encryption of weights, etc. for deploying Caffe models with the SDK. The complete set of options supported is depicted in the README document provided along with the tools. Currently these tools support only Caffe models.

1. Model converter

The model converter tool applies possible static optimizations to the model. It is highly recommended to optimize the model using this tool prior to executing it on CPU, whereas if the compute unit is GPU, this step is mandatory. Additionally, the model converter provides the option of compressing the weights to different data types (e.g., float16).

Example: Apply static optimizations to the model

snap-caffe-tools/release/bin$ ./snap_model_converter -p model/ squeezenet.prototxt -w model
/squeezenet.caffemodel -o model/converted_squeezenet


Example: Weight compression to float16

snap-caffe-tools/release/bin$ ./snap_model_converter -p model/squeezenet.prototxt -w 
model/squeezenet.caffemodel -c float16 -o model/float16_squeezenet


2. Model deconverter

The model deconverter tool changes the weights back to normal BVLC-Caffe-compatible blobs. This is required if the model, converted by the model converter tool (discussed above), is to be run by open source BVLC-Caffe. It should be noted that the original accuracy of the model cannot be regained using this tool.

Example: Model conversion back to the original format

snap-caffe-tools/release/bin$ ./snap_model_deconverter -t float16 -s 
model/float16_squeezenet.caffemodel -d model/orig_squeezenet.caffemodel


a. Model encryption

Neural SDK provides a model encryption tool that provides fast symmetric key encryption on the weights file. The tool uses Advanced Encryption Standard (AES) CBC Mode, which is an advanced form of block cipher encryption. After encryption, the encrypted model can be passed to the SDK APIs, which takes care of the decryption of the model prior to loading the network.

Example: Encryption

snap-caffe-tools/release/bin$  ./snap_model_encryptor -w model/squeezenet.caffemodel -o 
model/encr_squeezenet.caffemodel --encrypt 1


b. Bin-Txt converter

This is a debug tool to interchange between text- & binary-serialized prototxt formats, which can be used to view the changes made to .caffemodel for debugging, or manually change model contents.

Example: Binary to text conversion

snap-caffe-tools/release/bin$  ./bin_txt_converter -t txt -sp model/squeezenet.caffemodel -dp 
model/squeezenet.prototxt


Example: Text to binary conversion

snap-caffe-tools/release/bin$  ./bin_txt_converter -t bin  -sp model/ squeezenet.prototxt -dp 
model/squeezenet.caffemodel



Step 2: Model processing for NPU/DSP

The model intended to be executed needs to be converted to specific vendor formats, if it has to be run on NPU or DSP. These conversions are executed with the help of the tools provided by the vendor (Qualcomm or SLSI, as the case may be).

1. Model conversion for Qualcomm devices

In case of executing the model on NPU/DSP on a Qualcomm chipset based device, it needs to be converted to appropriate vendor format (.dlc). This is achieved using the model conversion tools provided by Qualcomm as part of SNPE SDK. Conversion to .dlc format is to be followed by quantization of the model.

SNPE SDK (https://developer.qualcomm.com/software/qualcomm-neural-processing-sdk) is to be downloaded and set up as explained in https://developer.qualcomm.com/docs/snpe/setup.html.

In the below procedures, $SNPE_ROOT refers to the directory path where the SNPE SDK is installed.

a. Caffe model conversion

Set Android NDK path as ANDROID_NDK_ROOT.

Using the SNPE SDK with Caffe models requires a valid Caffe installation which can be obtained and built from https://github.com/BVLC/caffe.

Once Caffe is installed, setup the SNPE SDK environment from $SNPE_ROOT using the below script where $CAFFE_DIR is the path to the Caffe installation.

source bin/envsetup.sh -c $CAFFE_DIR


Now, the snpe-caffe-to-dlc tool is used for converting the trained caffe model to the .dlc format as below:

snpe-caffe-to-dlc [-h] -c CAFFE_TXT [-b CAFFE_BIN] [-d DLC]
Example:
snpe-caffe-to-dlc --caffe_txt models/mobilenet.prototxt --caffe_bin models/mobilenet.caffemodel --dlc models/mobilenet.dlc


b. Tensorflow model conversion

Set Android NDK path as ANDROID_NDK_ROOT.

Using the SNPE SDK with TensorFlow models require a valid TensorFlow installation which can be obtained using the instructions given at https://www.tensorflow.org/get_started/os_setup.

Once TensorFlow is installed, setup the SNPE SDK environment from $SNPE_ROOT using the below script where $TENSORFLOW_DIR is the path to the TensorFlow installation.

source bin/envsetup.sh -t $ TENSORFLOW_DIR


Now, the snpe-tensorflow-to-dlc tool is used for converting a trained caffe model to the .dlc format as below:

snpe-tensorflow-to-dlc [-h] --graph GRAPH - input_dim INPUT_DIM --out_node OUT_NODE [--dlc DLC]

Example:
snpe-tensorflow-to-dlc --graph models/tensorflow_inception_graph.pb --input_dim input "1,224,224,3" --out_node "softmax0" --dlc models/tensorflow_inception.dlc


c. Quantization of .dlc model

After conversion, the generated .dlc model file needs to be quantized using the snpe-dlc-quantize tool prior to execution on the device.

ssnpe-dlc-quantize [-h] --input_dlc DLC --input_list INPUT_LIST --output_dlc QUANTIZED_DLC

Example:
snpe-dlc-quantize --input_dlc models/mobilenet.dlc --input_list target_raw_list.txt --output_dlc models/mobilenet_quantized.dlc


The INPUT_LIST parameter is the path to a file specifying the trial inputs. This file should be a plain text file, containing one or more absolute file paths per line. An example is given below:

target_raw_list.txt

  • /home/trial_img_set/image1.raw

  • /home/trial_img_set/image2.raw

  • /home/trial_img_set/image3.raw

  • /home/trial_img_set/image4.raw

  • /home/trial_img_set/image5.raw

2. Model conversion for SLSI devices

Conversion of the model to the appropriate vendor format (.tflite) is required to execute it on NPU in Exynos chipset based devices. Currently Exynos NPU supports only Caffe trained models. Exynos provides a docker environment which helps in profiling, quantizing, and compilation of the model to provide the converted model in .tflite. The docker image is released by vendor for each compile version.

a. Caffe model conversion

User has to load the docker image via the following command:

docker load --input ubuntu-16.04-npuc-rel-0.972.docker.image

After the docker image is loaded to the host machine, enter the docker by running the script docker-npuc.sh script provided by the vendor along with the image.

chmod 777 docker-npuc.sh
./docker-npuc.sh  

Model conversion to .tflite format contains mainly three steps namely Profiling, Quantization, and NPU compilation.

First, the prototxt file needs to be modified as in the below example:

name: “MODEL_NAME”
layer {
name: "data"
type: "Input"
top: "data“
top : “label”
transform_param {
scale: 1.0
mirror: false
mean_value: 104.0
mean_value: 117.0
mean_value: 123.0
} input_param {
shape {
dim: 1
dim: 3
dim: 300
dim: 300
}
}
}

Now the profiling script (profile.sh) is to be run followed by the quantizing script (quantize.sh) which are provided along with the docker image.

As the result of running the quantizing script, three files are generated: a quantized caffemodel, prototxt, and a protobin.

The final step is to combine the generated quantized caffemodel and profiling information by running the below command, which generates the .tflite model inside the mentioned out_folder.

npulite-gen -m [quantized caffemodel] -o [output_folder] -b [protobin file] -p [profiled prototxt]
Example:
npulite-gen –m Q_mobilenet.caffemodel -o out_folder -b Q_mobilenet.protobin 
-p Q_mobilenet.prototxt

Step 3: Application development

The sample benchmark application provided can be used as a template to build your own image classification program.

In a nutshell, a model object and an execution options object are created using the model and the desired execution options. A new session instance is created and the above objects are passed on to the Open() API to load the model to memory. Further, the Execute() API is called to run the model on the device, after setting the required input data, to generate the inference results. Now, by calling the Close() API, the resources held by the session can be released and the users can utilize the inference results as deemed fit.

1. Create a model object

Create a Model object by calling SnapModel::Create().

On success, the output parameter SnapModel &model, contains the created SnapModel object.

a. Create the SnapModel object by passing the model as a file

static ErrCode Create(ModelFWType type, const std::vector &input_names, const std::vector &output_names, const std::string &weight_file, const std::string &model_file, SnapModel &model);
Example: 
snap_sdk::ErrCode status;  
status = snap_sdk::SnapModel::Create(type, inputNames, outputNames, weight_file, model_file,model);

OR

b. Create the SnapModel object by passing the model as a buffer

static ErrCode Create(ModelFWType type, std::vector &input_names, const std::vector &output_names, const std::string &model_file, const std::vector &model_buffer, SnapModel &model);

Example: 
snap_sdk::ErrCode status;  
status = snap_sdk::SnapModel::Create(type, inputNames, outputNames, model_file, model_buffer, model);

2. Create an execution options object

Create an execution options object by calling ExecutionOptions::Create().

On success, the output parameter ExecutionOptions &execTypeObj, contains the created ExecutionOptions object.

static ErrCode Create(ComputingUnit compUnit, ExecutionDataType execType, SnapModel &model, ExecutionOptions &execTypeObj);

Example:
status = snap_sdk::ExecutionOptions::Create(compUnit,execType,model,execTypeObj);

3. Create a session instance

Obtain an instance of SnapSessionInterface by calling CreateSnapSession().

extern "C++" ErrCode CreateSnapSession(SnapSessionInterface **session);

Example:
status =snap_sdk::CreateSnapSession(&session_);

4. Set the input data

Set the input data by calling SetData().

ErrCode SetData(const void *buffer, std::vector shapes, DataType type, DataFormat format);

Example for setting NHWC and float32 options :
std::vector shape = {1, modelHeight, modelWidth, modelChannels};
snap_sdk::DataType type = snap_sdk::DataType::FLOAT32;
snap_sdk::DataFormat format = snap_sdk::DataFormat::NHWC;
snap_sdk::SnapData::SetData(buffer, shape, type, format);

5. Set cache path for GPU

Set the GPU cache path (required only if the computing unit is GPU).

ErrCode SetGpuCacheSource(const std::string &cache_dir);

Example: 
status = execTypeObj.SetGpuCacheSource("/My_App/snap/tmp/");

6. Load the model

Load the model by calling Open() with the created SnapModel object and ExecutionOptions object as arguments, using the created session instance.

virtual ErrCode Open(SnapModel &model, ExecutionOptions &execOptions) = 0;

Example: 
status = session_->Open(model,execTypeObj);

7. Execute the model

Now, the model can be executed by calling Execute(), which will fill and return the "outputs" parameter.

virtual ErrCode Execute(const std::vector<SnapData> &inputs, std::vector<SnapData> *outputs) = 0; 

Example: 
status = session_->Execute(inputs, &outputs);

8. Extract the output

Extract the output on successful return of Execute().

void *GetBuffer() const;
std::vector GetShapes() const;
DataType GetType() const;
DataFormat GetFormat() const;

Example: 
std::cout << "values : " << static_cast(outputs[0].GetShapes()[0]) ;
std::cout << "format : " << outputs[0].GetFormat(); 
float *outputData = static_cast(outputs[0].GetBuffer()); 

9. Close the session

The session instance can now be released by calling Close().

virtual ErrCode Close() = 0;

Example: 
status = session_->Close();

10. Destroy the session instance

After Close(), destroy the SnapSessionInterface instance created.

extern "C++" ErrCode DestroySnapSession(SnapSessionInterface *session);

Example: 
status = snap_sdk::DestroySnapSession(session_);

Step 4: Running the sample native benchmark application

The benchmarking code may be executed by passing the following arguments to the script file provided. [Model name] [Android NDK version (should be at least 17b)] [Type of model (caffe/tensorflow)] [Preferred computing] [Execution Data type] [Mode of execution] [comma separated output layer names] [N, number of samples] [H, height of the model] [W, width of the model] [C, number of channels] [is_encrypted_model (true/false)] [Input name of the model]

For example, the following instructions may be followed to run the sample benchmark code on CPU for a caffemodel.

  1. Copy the weight file (e.g., squeezenet.caffemodel) to benchmark/data/model/.

  2. Copy the model file (e.g., squeezenet.prototxt) to benchmark/data/prototxt/.

  3. Copy the libsnap_vndk.so library to benchmark/snap_libs/arm64-v8a/clang/.

  4. Run the Benchmark.sh file with appropriate options from benchmark folder.

e.g.:  ./Benchmark.sh squeezenet.caffemodel ~/Android/Ndk/android-ndk-r17b/ 
caffe cpu float32 classifier prob 1 224 224 3 false data

Expected output is in the following manner:


Comparison of inference times

The superior performance of Neural SDK can be compared against another vendor SDK to see visible difference in inference timings.

In the below section, Snapdragon Neural Processing Engine (SNPE) SDK is used to run the same model which was executed in the previous section with Neural SDK.

Procedure is as follows:

Convert the squeezenet (Caffe) model to .dlc format (refer to section 2.2.a Caffe model conversion) to generate the model file squeezenet_snpe.dlc.

Quantization of the model is not required, as we are comparing performance on CPU.

Using the vendor benchmark application provided, execute the model as below:

  1. Copy the dlc file (e.g., squeezenet_snpe.dlc) to benchmark/data/model/.

  2. Run the Benchmark.sh file with appropriate options from benchmark folder.

e.g.:  ./Benchmark.sh squeezenet ~/Android/Ndk/android-ndk-r17b/ clang snpe cpu 
float32 lib64 classifier prob 1 224 224 3 false

Expected output is in the following manner:


When this inference time (53.3091 ms) is compared with that of Neural SDK (18.0702 ms), the result is quite evident.


You're done!

Congratulations! You have successfully achieved the goal of this Code Lab activity. Now, you can accelerate neural network models in your device all by yourself! Watch the video to learn more about the advantages of using a Samsung Neural SDK.


If you're having any trouble with the activity, you may check out the link below.

Neural SDK - Sample Benchmark Code16.98 MB