●リスト1

[Target]# xdputil query
{
"DPU IP Spec":{
"DPU Core Count":1,
"IP version":"v4.1.0",
"generation timestamp":"2022-11-30 19-15-00",
"git commit id":"ce8dd1",
"git commit time":2022113019,
"regmap":"1to1 version"
},
"VAI Version":{
"libvaip-core.so":"Xilinx vaip Version: 1.0.0-a176db67b19f94b0a31f9d24ef80322efe4494ad 2022-12-27-01:24:22 ",
"libvart-runner.so":"Xilinx vart-runner Version: 3.0.0-2efa5fe1e56c2b2c8a7e71e9fc1636242dd50a9f 2022-12-27-00:47:05 ",
"libvitis_ai_library-dpu_task.so":"Xilinx vitis_ai_library dpu_task Version: 3.0.0-1cccff04dc341c4a6287226828f90aed56005f4f 2022-12-20 10:29:01 [UTC] ",
"libxir.so":"Xilinx xir Version: xir-9204ac72103092a7b253a0c23ec7471481656940 2022-12-27-00:46:16",
"target_factory":"target-factory.3.0.0 860ed0499ab009084e2df3004eeb9ae710c26351"
},
"kernels":[
{
"DPU Arch":"DPUCZDX8G_ISA1_B4096",
"DPU Frequency (MHz)":300,
"IP Type":"DPU",
"Load Parallel":2,
"Load augmentation":"enable",
"Load minus mean":"disable",
"Save Parallel":2,
"XRT Frequency (MHz)":300,
"cu_addr":"0xa0010000",
"cu_handle":"0xaaaaebd6c7c0",
"cu_idx":0,
"cu_mask":1,
"cu_name":"DPUCZDX8G:DPUCZDX8G_1",
"device_id":0,
"fingerprint":"0x101000056010407",
"name":"DPU Core 0"
}
]
}

●[Docker]$ python resnet18_pruning.py –help コマンドの実行結果

    [Docker]$ python resnet18_pruning.py --help
[VAIQ_NOTE]: Loading NNDCT kernels...
usage: resnet18_pruning.py [-h] [--gpus GPUS] [--method METHOD] [--load_slim_model] [--slim_model SLIM_MODEL] [--lr LR] [--sparsity SPARSITY]
[--ana_subset_len ANA_SUBSET_LEN] [--num_subnet NUM_SUBNET] [--epoches EPOCHES] [--pretrained PRETRAINED] [--data_dir DATA_DIR]
[--num_workers NUM_WORKERS] [--batch_size BATCH_SIZE] [--weight_decay WEIGHT_DECAY] [--channel_divisible CHANNEL_DIVISIBLE]
[--momentum MOMENTUM]

optional arguments:
-h, --help show this help message and exit
--gpus GPUS String of available GPU number
--method METHOD iterative/one_step
--load_slim_model Load slim model
--slim_model SLIM_MODEL
Slim/Pruned model filepath
--lr LR Initial learning rate
--sparsity SPARSITY Sparsity ratio
--ana_subset_len ANA_SUBSET_LEN
Subset length for evaluating model in analysis, using the whole validation dataset if it is not set
--num_subnet NUM_SUBNET
Total number of subnet
--epoches EPOCHES Train epoch
--pretrained PRETRAINED
Pretrained model filepath
--data_dir DATA_DIR Dataset directory
--num_workers NUM_WORKERS
Number of workers used in dataloading
--batch_size BATCH_SIZE
Batch size
--weight_decay WEIGHT_DECAY
Weight decay
--channel_divisible CHANNEL_DIVISIBLE
channel_divisible
--momentum MOMENTUM Momentum

●[Docker]$ cd Quantizer
 [Docker]$ python resnet18_quant.py -h コマンドの実行結果

[Docker]$ cd Quantizer
[Docker]$ python resnet18_quant.py -h
python resnet18_quant.py -h

[VAIQ_NOTE]: Loading NNDCT kernels...
usage: resnet18_quant.py [-h] [--data_dir DATA_DIR] [--model_dir MODEL_DIR] [--config_file CONFIG_FILE] [--subset_len SUBSET_LEN] [--batch_size BATCH_SIZE] [--quant_mode {float,calib,test}] [--fast_finetune] [--deploy]
[--inspect] [--target [TARGET]]

optional arguments:
-h, --help show this help message and exit
--data_dir DATA_DIR Data set directory, when quant_mode=calib, it is for calibration, while quant_mode=test it is for evaluation
--model_dir MODEL_DIR
Trained model file path. Download pretrained model from the following url and put it in model_dir specified path: https://download.pytorch.org/models/resnet18-5c106cde.pth
--config_file CONFIG_FILE
quantization configuration file
--subset_len SUBSET_LEN
subset_len to evaluate model, using the whole validation dataset if it is not set
--batch_size BATCH_SIZE
input data batch size to evaluate model
--quant_mode {float,calib,test}
quantization mode. 0: no quantization, evaluate float model, calib: quantize, test: evaluate quantized model
--fast_finetune fast finetune model before calibration
--deploy export xmodel for deployment
--inspect inspect model
--target [TARGET] specify target device

●[Docker]$ $ python validate_quant.py -h コマンドの実行結果

[Docker]$ $ python validate_quant.py -h
usage: validate_quant.py [-h] [--data-dir DIR] [--dataset NAME] [--split NAME] [--dataset-download] [--model NAME] [-j N] [-b N] [--img-size N] [--use-train-size] [--crop-pct N] [--crop-mode N] [--mean MEAN [MEAN ...]]
[--std STD [STD ...]] [--interpolation NAME] [--num-classes NUM_CLASSES] [--class-map FILENAME] [--gp POOL] [--log-freq N] [--checkpoint PATH] [--pretrained] [--test-pool] [--no-prefetcher] [--pin-mem]
[--channels-last] [--device DEVICE] [--use-ema] [--model-kwargs [MODEL_KWARGS [MODEL_KWARGS ...]]] [--torchscript] [--real-labels FILENAME] [--valid-labels FILENAME] [--quantized_out NAME]
[--quant_mode QUANT_MODE] [--config_file CONFIG_FILE] [--deploy] [--fast_finetune] [--inspect] [--subset_len SUBSET_LEN] [--ff_subset_len FF_SUBSET_LEN]
[DIR]

PyTorch ImageNet Validation

positional arguments:
DIR path to dataset (*deprecated*, use --data-dir)

optional arguments:
-h, --help show this help message and exit
--data-dir DIR path to dataset (root dir)
--dataset NAME dataset type + name ("<type>/<name>") (default: ImageFolder or ImageTar if empty)
--split NAME dataset split (default: validation)
--dataset-download Allow download of dataset for torch/ and tfds/ datasets that support it.
--model NAME, -m NAME
model architecture (default: dpn92)
-j N, --workers N number of data loading workers (default: 4)
-b N, --batch-size N mini-batch size (default: 256)
--img-size N Input image dimension, uses model default if empty
--use-train-size force use of train input size, even when test size is specified in pretrained cfg
--crop-pct N Input image center crop pct
--crop-mode N Input image crop mode (squash, border, center). Model default if None.
--mean MEAN [MEAN ...]
Override mean pixel value of dataset
--std STD [STD ...] Override std deviation of of dataset
--interpolation NAME Image resize interpolation type (overrides model)
--num-classes NUM_CLASSES
Number classes in dataset
--class-map FILENAME path to class to idx mapping file (default: "")
--gp POOL Global pool type, one of (fast, avg, max, avgmax, avgmaxc). Model default if None.
--log-freq N batch logging frequency (default: 32)
--checkpoint PATH path to latest checkpoint (default: none)
--pretrained use pre-trained model
--test-pool enable test time pool
--no-prefetcher disable fast prefetcher
--pin-mem Pin CPU memory in DataLoader for more efficient (sometimes) transfer to GPU.
--channels-last Use channels_last memory layout
--device DEVICE Device (accelerator) to use.
--use-ema use ema version of weights if present
--model-kwargs [MODEL_KWARGS [MODEL_KWARGS ...]]
--torchscript torch.jit.script the full model
--real-labels FILENAME
Real labels JSON file for imagenet evaluation
--valid-labels FILENAME
Valid label indices txt file for validation of partial label space
--quantized_out NAME quantized_out path
--quant_mode QUANT_MODE
Should be one of float/calib/test
--config_file CONFIG_FILE
Provide quant_config.json
--deploy Deploy model
--fast_finetune Fast finetune the model
--inspect Inspect model
--subset_len SUBSET_LEN
Subset length to evaluate model, using the whole validation dataset if it is not set
--ff_subset_len FF_SUBSET_LEN
Subset length to evaluate model of fast finetune, using the whole validation dataset if it is not set

●リスト24の全部

(a)resnet18_pt

        [Target] # ./test_jpeg_classification resnet18_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0709 10:35:01.134765 1506540 demo.hpp:1193] batch: 0 image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
I0709 10:35:01.134989 1506540 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.999475
I0709 10:35:01.135370 1506540 process_result.hpp:24] r.index 287 lynx, catamount, r.score 0.000430517
I0709 10:35:01.135494 1506540 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 4.53761e-05
I0709 10:35:01.135692 1506540 process_result.hpp:24] r.index 291 lion, king of beasts, Panthera leo, r.score 1.30005e-05
I0709 10:35:01.135864 1506540 process_result.hpp:24] r.index 282 tiger cat, r.score 1.01248e-05

(b)resnet18_pruned_pt

        [Target] # ./test_jpeg_classification resnet18_pruned_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0709 10:35:45.505468 1506839 demo.hpp:1193] batch: 0 image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
I0709 10:35:45.505685 1506839 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.975592
I0709 10:35:45.506140 1506839 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 0.0122809
I0709 10:35:45.506359 1506839 process_result.hpp:24] r.index 242 boxer, r.score 0.0011423
I0709 10:35:45.506470 1506839 process_result.hpp:24] r.index 285 Egyptian cat, r.score 0.00100808
I0709 10:35:45.506603 1506839 process_result.hpp:24] r.index 245 French bulldog, r.score 0.000889625

●[Target] # vaitrace -h

[Target] # vaitrace -h
usage: vaitrace [-h] [-c [CONFIG]] [-d] [-o [TRACESAVETO]] [-t [TIMEOUT]] [-v] [-b] [-p] [--va] [--xat] [--txt_summary] [--json_summary] [--fine_grained] ...

positional arguments:
cmd

optional arguments:
-h, --help show this help message and exit
-c [CONFIG] Specify the config file
-d Enable debug
-o [TRACESAVETO] Save report to, only available for txt summary mode
-t [TIMEOUT] Tracing time limit in second, default value is 60
-v Show version
-b Bypass vaitrace, just run command
-p Trace python application
--va Generate trace data for Vitis Analyzer
--xat Save raw data, for debug usage
--txt_summary Display txt summary
--json_summary Display json summary
--fine_grained Fine grained mode

●モデルの分析

・resnet18_ptの場合

[Target] # vaitrace --txt_summary ./test_jpeg_classification resnet18_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
INFO:root:VART will run xmodel in [NORMAL] mode
INFO:root:Executable file: /home/root/Vitis-AI/examples/vai_library/samples/classification/test_jpeg_classification
Analyzing symbol tables...
161 / 161
81 / 81
6 / 6
45 / 45
3 / 3
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0709 11:01:19.742046 1518158 demo.hpp:1193] batch: 0 image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
I0709 11:01:19.742246 1518158 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.999475
I0709 11:01:19.742642 1518158 process_result.hpp:24] r.index 287 lynx, catamount, r.score 0.000430517
I0709 11:01:19.742765 1518158 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 4.53761e-05
I0709 11:01:19.742966 1518158 process_result.hpp:24] r.index 291 lion, king of beasts, Panthera leo, r.score 1.30005e-05
I0709 11:01:19.743139 1518158 process_result.hpp:24] r.index 282 tiger cat, r.score 1.01248e-05

APM Stop Collecting
INFO:root:Generating ascii-table summary
INFO:root:Processing xmodel information
DPU Summary:
==========================================================================================================================================
DPU Id | Bat | DPU SubGraph | WL | SW_RT | HW_RT | Effic | LdWB | LdFM | StFM | AvgBw
------------+-----+----------------------------------------------------+-------+-------+-------+-------+--------+-------+-------+---------
DPUCZDX8G_1 | 1 | ResNet__ResNet_AdaptiveAvgPool2d_avgpool__4042_fix | 3.634 | 4.203 | 4.085 | 72.4 | 11.143 | 0.792 | 0.192 | 2968.838
==========================================================================================================================================

Notes:
"~0": Value is close to 0, Within range of (0, 0.001)
Bat: Batch size of the DPU instance
WL(Work Load): Computation workload (MAC indicates two operations), unit is GOP
SW_RT(Software Run time): The execution time calculate by software in milliseconds, unit is ms
HW_RT(Hareware Run time): The execution time from hareware operation in milliseconds, unit is ms
Effic(Efficiency): The DPU actual performance divided by peak theoretical performance,unit is %
Perf(Performance): The DPU performance in unit of GOP per second, unit is GOP/s
LdFM(Load Size of Feature Map): External memory load size of feature map, unit is MB
LdWB(Load Size of Weight and Bias): External memory load size of bias and weight, unit is MB
StFM(Store Size of Feature Map): External memory store size of feature map, unit is MB
AvgBw(Average bandwidth): External memory average bandwidth. unit is MB/s
....


CPU Functions(Not in Graph, e.g.: pre/post-processing, vai-runtime):
===========================================================================
Function | Device | Runs | AverageRunTime(ms)
---------------------------------------+--------+------+-------------------
cv::imread | CPU | 1 | 4.675
cv::resize | CPU | 1 | 0.410
xir::XrtCu::run | CPU | 1 | 4.185
vitis::ai::DpuTaskImp::run | CPU | 1 | 4.395
vitis::ai::ConfigurableDpuTaskImp::run | CPU | 1 | 4.408
vitis::ai::ClassificationImp::run_2 | CPU | 1 | 5.985
===========================================================================

・resnet18_pruned_ptの場合

[Target] # vaitrace --txt_summary ./test_jpeg_classification resnet18_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0709 10:35:01.134765 1506540 demo.hpp:1193] batch: 0 image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
I0709 10:35:01.134989 1506540 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.999475
I0709 10:35:01.135370 1506540 process_result.hpp:24] r.index 287 lynx, catamount, r.score 0.000430517
I0709 10:35:01.135494 1506540 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 4.53761e-05
I0709 10:35:01.135692 1506540 process_result.hpp:24] r.index 291 lion, king of beasts, Panthera leo, r.score 1.30005e-05
I0709 10:35:01.135864 1506540 process_result.hpp:24] r.index 282 tiger cat, r.score 1.01248e-05

[Target] # vaitrace --txt_summary ./test_jpeg_classification resnet18_pruned_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
vaitrace --txt_summary ./test_jpeg_classification resnet18_pruned_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
INFO:root:VART will run xmodel in [NORMAL] mode
INFO:root:Executable file: /home/root/Vitis-AI/examples/vai_library/samples/classification/test_jpeg_classification
Analyzing symbol tables...
161 / 161
81 / 81
6 / 6
45 / 45
3 / 3
WARNING: Logging before InitGoogleLogging() is written to STDERR
I0709 11:03:35.327162 1520058 demo.hpp:1193] batch: 0 image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
I0709 11:03:35.327795 1520058 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.975592
I0709 11:03:35.328372 1520058 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 0.0122809
I0709 11:03:35.328680 1520058 process_result.hpp:24] r.index 242 boxer, r.score 0.0011423
I0709 11:03:35.328869 1520058 process_result.hpp:24] r.index 285 Egyptian cat, r.score 0.00100808
I0709 11:03:35.329085 1520058 process_result.hpp:24] r.index 245 French bulldog, r.score 0.000889625

APM Stop Collecting
INFO:root:Generating ascii-table summary
INFO:root:Processing xmodel information
DPU Summary:
=========================================================================================================================================
DPU Id | Bat | DPU SubGraph | WL | SW_RT | HW_RT | Effic | LdWB | LdFM | StFM | AvgBw
------------+-----+----------------------------------------------------+-------+-------+-------+-------+-------+-------+-------+---------
DPUCZDX8G_1 | 1 | ResNet__ResNet_AdaptiveAvgPool2d_avgpool__4042_fix | 1.814 | 2.645 | 2.534 | 58.3 | 5.296 | 0.703 | 0.133 | 2419.460
=========================================================================================================================================

Notes:
"~0": Value is close to 0, Within range of (0, 0.001)
Bat: Batch size of the DPU instance
WL(Work Load): Computation workload (MAC indicates two operations), unit is GOP
SW_RT(Software Run time): The execution time calculate by software in milliseconds, unit is ms
HW_RT(Hareware Run time): The execution time from hareware operation in milliseconds, unit is ms
Effic(Efficiency): The DPU actual performance divided by peak theoretical performance,unit is %
Perf(Performance): The DPU performance in unit of GOP per second, unit is GOP/s
LdFM(Load Size of Feature Map): External memory load size of feature map, unit is MB
LdWB(Load Size of Weight and Bias): External memory load size of bias and weight, unit is MB
StFM(Store Size of Feature Map): External memory store size of feature map, unit is MB
AvgBw(Average bandwidth): External memory average bandwidth. unit is MB/s
....


CPU Functions(Not in Graph, e.g.: pre/post-processing, vai-runtime):
===========================================================================
Function | Device | Runs | AverageRunTime(ms)
---------------------------------------+--------+------+-------------------
cv::imread | CPU | 1 | 4.646
cv::resize | CPU | 1 | 0.404
xir::XrtCu::run | CPU | 1 | 2.628
vitis::ai::DpuTaskImp::run | CPU | 1 | 2.835
vitis::ai::ConfigurableDpuTaskImp::run | CPU | 1 | 2.847
vitis::ai::ClassificationImp::run_2 | CPU | 1 | 4.506
===========================================================================

●XMODEL生成のために筆者のPCではこの変更が必要だった(run_ptq.sh)

[Docker]$ diff -Nur ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_ptq.sh script/
run_ptq.sh
--- ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_ptq.sh
+++ script/run_ptq.sh
@@ -1,5 +1,5 @@
# Calibration
-python validate_quant.py --quantized_out quantize_result/ptq --quant_mode calib
--data-dir $2 -b 32 --model $1 --subset_len 5120
+python validate_quant.py --quantized_out quantize_result/ptq --quant_mode calib
--data-dir $2 -b 32 --model $1 --subset_len 200
# Test
python validate_quant.py --quantized_out quantize_result/ptq --quant_mode test
--data-dir $2 -b 32 --model $1
# Deployment

●筆者の環境ではPCのリソースの関係で変更が必要だった(run_fast_finetune.sh)

[Docker]$ diff -Nur ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_fast_finetune.sh script/run_fast_finetune.sh
--- ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_fast_finetune.sh
+++ script/run_fast_finetune.sh
@@ -1,5 +1,5 @@
# Calibration with Fast_finetune
-python validate_quant.py --quantized_out quantize_result/fast_finetune --quant_mode calib
--data-dir $2 -b 32 --model $1 --fast_finetune --ff_subset_len 1024 --subset_len 5120
+python validate_quant.py --quantized_out quantize_result/fast_finetune --quant_mode calib
--data-dir $2 -b 32 --model $1 --fast_finetune --ff_subset_len 40 --subset_len 200
# Test with Fast_finetune
python validate_quant.py --quantized_out quantize_result/fast_finetune --quant_mode test
--data-dir $2 -b 32 --model $1 --fast_finetune
# Deployment with Fast_finetune

コメントを残す