特別版No.4?　AMDサポート・ページ

●リスト1

[Target]# xdputil query
{
  "DPU IP Spec":{
    "DPU Core Count":1,
    "IP version":"v4.1.0",
    "generation timestamp":"2022-11-30 19-15-00",
    "git commit id":"ce8dd1",
    "git commit time":2022113019,
    "regmap":"1to1 version"
  },
  "VAI Version":{
    "libvaip-core.so":"Xilinx vaip Version: 1.0.0-a176db67b19f94b0a31f9d24ef80322efe4494ad  2022-12-27-01:24:22 ",
    "libvart-runner.so":"Xilinx vart-runner Version: 3.0.0-2efa5fe1e56c2b2c8a7e71e9fc1636242dd50a9f  2022-12-27-00:47:05 ",
    "libvitis_ai_library-dpu_task.so":"Xilinx vitis_ai_library dpu_task Version: 3.0.0-1cccff04dc341c4a6287226828f90aed56005f4f  2022-12-20 10:29:01 [UTC] ",
    "libxir.so":"Xilinx xir Version: xir-9204ac72103092a7b253a0c23ec7471481656940 2022-12-27-00:46:16",
    "target_factory":"target-factory.3.0.0 860ed0499ab009084e2df3004eeb9ae710c26351"
  },
  "kernels":[
    {
      "DPU Arch":"DPUCZDX8G_ISA1_B4096",
      "DPU Frequency (MHz)":300,
      "IP Type":"DPU",
      "Load Parallel":2,
      "Load augmentation":"enable",
      "Load minus mean":"disable",
      "Save Parallel":2,
      "XRT Frequency (MHz)":300,
      "cu_addr":"0xa0010000",
      "cu_handle":"0xaaaaebd6c7c0",
      "cu_idx":0,
      "cu_mask":1,
      "cu_name":"DPUCZDX8G:DPUCZDX8G_1",
      "device_id":0,
      "fingerprint":"0x101000056010407",
      "name":"DPU Core 0"
    }
  ]
}

●[Docker]$ python resnet18_pruning.py –help　コマンドの実行結果

    [Docker]$ python resnet18_pruning.py --help
    [VAIQ_NOTE]: Loading NNDCT kernels...
    usage: resnet18_pruning.py [-h] [--gpus GPUS] [--method METHOD] [--load_slim_model] [--slim_model SLIM_MODEL] [--lr LR] [--sparsity SPARSITY]
                            [--ana_subset_len ANA_SUBSET_LEN] [--num_subnet NUM_SUBNET] [--epoches EPOCHES] [--pretrained PRETRAINED] [--data_dir DATA_DIR]
                            [--num_workers NUM_WORKERS] [--batch_size BATCH_SIZE] [--weight_decay WEIGHT_DECAY] [--channel_divisible CHANNEL_DIVISIBLE]
                            [--momentum MOMENTUM]

    optional arguments:
    -h, --help            show this help message and exit
    --gpus GPUS           String of available GPU number
    --method METHOD       iterative/one_step
    --load_slim_model     Load slim model
    --slim_model SLIM_MODEL
                            Slim/Pruned model filepath
    --lr LR               Initial learning rate
    --sparsity SPARSITY   Sparsity ratio
    --ana_subset_len ANA_SUBSET_LEN
                            Subset length for evaluating model in analysis, using the whole validation dataset if it is not set
    --num_subnet NUM_SUBNET
                            Total number of subnet
    --epoches EPOCHES     Train epoch
    --pretrained PRETRAINED
                            Pretrained model filepath
    --data_dir DATA_DIR   Dataset directory
    --num_workers NUM_WORKERS
                            Number of workers used in dataloading
    --batch_size BATCH_SIZE
                            Batch size
    --weight_decay WEIGHT_DECAY
                            Weight decay
    --channel_divisible CHANNEL_DIVISIBLE
                            channel_divisible
    --momentum MOMENTUM   Momentum

●[Docker]$ cd Quantizer
　[Docker]$ python resnet18_quant.py -h　コマンドの実行結果

[Docker]$ cd Quantizer
[Docker]$ python resnet18_quant.py -h
python resnet18_quant.py -h

[VAIQ_NOTE]: Loading NNDCT kernels...
usage: resnet18_quant.py [-h] [--data_dir DATA_DIR] [--model_dir MODEL_DIR] [--config_file CONFIG_FILE] [--subset_len SUBSET_LEN] [--batch_size BATCH_SIZE] [--quant_mode {float,calib,test}] [--fast_finetune] [--deploy]
                         [--inspect] [--target [TARGET]]

optional arguments:
-h, --help            show this help message and exit
--data_dir DATA_DIR   Data set directory, when quant_mode=calib, it is for calibration, while quant_mode=test it is for evaluation
--model_dir MODEL_DIR
                        Trained model file path. Download pretrained model from the following url and put it in model_dir specified path: https://download.pytorch.org/models/resnet18-5c106cde.pth
--config_file CONFIG_FILE
                        quantization configuration file
--subset_len SUBSET_LEN
                        subset_len to evaluate model, using the whole validation dataset if it is not set
--batch_size BATCH_SIZE
                        input data batch size to evaluate model
--quant_mode {float,calib,test}
                        quantization mode. 0: no quantization, evaluate float model, calib: quantize, test: evaluate quantized model
--fast_finetune       fast finetune model before calibration
--deploy              export xmodel for deployment
--inspect             inspect model
--target [TARGET]     specify target device

●[Docker]$ $ python validate_quant.py -h　コマンドの実行結果

[Docker]$ $ python validate_quant.py -h
usage: validate_quant.py [-h] [--data-dir DIR] [--dataset NAME] [--split NAME] [--dataset-download] [--model NAME] [-j N] [-b N] [--img-size N] [--use-train-size] [--crop-pct N] [--crop-mode N] [--mean MEAN [MEAN ...]]
                        [--std STD [STD ...]] [--interpolation NAME] [--num-classes NUM_CLASSES] [--class-map FILENAME] [--gp POOL] [--log-freq N] [--checkpoint PATH] [--pretrained] [--test-pool] [--no-prefetcher] [--pin-mem]
                        [--channels-last] [--device DEVICE] [--use-ema] [--model-kwargs [MODEL_KWARGS [MODEL_KWARGS ...]]] [--torchscript] [--real-labels FILENAME] [--valid-labels FILENAME] [--quantized_out NAME]
                        [--quant_mode QUANT_MODE] [--config_file CONFIG_FILE] [--deploy] [--fast_finetune] [--inspect] [--subset_len SUBSET_LEN] [--ff_subset_len FF_SUBSET_LEN]
                        [DIR]

PyTorch ImageNet Validation

positional arguments:
DIR                   path to dataset (*deprecated*, use --data-dir)

optional arguments:
-h, --help            show this help message and exit
--data-dir DIR        path to dataset (root dir)
--dataset NAME        dataset type + name ("<type>/<name>") (default: ImageFolder or ImageTar if empty)
--split NAME          dataset split (default: validation)
--dataset-download    Allow download of dataset for torch/ and tfds/ datasets that support it.
--model NAME, -m NAME
                        model architecture (default: dpn92)
-j N, --workers N     number of data loading workers (default: 4)
-b N, --batch-size N  mini-batch size (default: 256)
--img-size N          Input image dimension, uses model default if empty
--use-train-size      force use of train input size, even when test size is specified in pretrained cfg
--crop-pct N          Input image center crop pct
--crop-mode N         Input image crop mode (squash, border, center). Model default if None.
--mean MEAN [MEAN ...]
                        Override mean pixel value of dataset
--std STD [STD ...]   Override std deviation of of dataset
--interpolation NAME  Image resize interpolation type (overrides model)
--num-classes NUM_CLASSES
                        Number classes in dataset
--class-map FILENAME  path to class to idx mapping file (default: "")
--gp POOL             Global pool type, one of (fast, avg, max, avgmax, avgmaxc). Model default if None.
--log-freq N          batch logging frequency (default: 32)
--checkpoint PATH     path to latest checkpoint (default: none)
--pretrained          use pre-trained model
--test-pool           enable test time pool
--no-prefetcher       disable fast prefetcher
--pin-mem             Pin CPU memory in DataLoader for more efficient (sometimes) transfer to GPU.
--channels-last       Use channels_last memory layout
--device DEVICE       Device (accelerator) to use.
--use-ema             use ema version of weights if present
--model-kwargs [MODEL_KWARGS [MODEL_KWARGS ...]]
--torchscript         torch.jit.script the full model
--real-labels FILENAME
                        Real labels JSON file for imagenet evaluation
--valid-labels FILENAME
                        Valid label indices txt file for validation of partial label space
--quantized_out NAME  quantized_out path
--quant_mode QUANT_MODE
                        Should be one of float/calib/test
--config_file CONFIG_FILE
                        Provide quant_config.json
--deploy              Deploy model
--fast_finetune       Fast finetune the model
--inspect             Inspect model
--subset_len SUBSET_LEN
                        Subset length to evaluate model, using the whole validation dataset if it is not set
--ff_subset_len FF_SUBSET_LEN
                        Subset length to evaluate model of fast finetune, using the whole validation dataset if it is not set

●リスト24の全部

（a）resnet18_pt

        [Target] # ./test_jpeg_classification resnet18_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
        WARNING: Logging before InitGoogleLogging() is written to STDERR
        I0709 10:35:01.134765 1506540 demo.hpp:1193] batch: 0     image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
        I0709 10:35:01.134989 1506540 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.999475
        I0709 10:35:01.135370 1506540 process_result.hpp:24] r.index 287 lynx, catamount, r.score 0.000430517
        I0709 10:35:01.135494 1506540 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 4.53761e-05
        I0709 10:35:01.135692 1506540 process_result.hpp:24] r.index 291 lion, king of beasts, Panthera leo, r.score 1.30005e-05
        I0709 10:35:01.135864 1506540 process_result.hpp:24] r.index 282 tiger cat, r.score 1.01248e-05

（b）resnet18_pruned_pt

        [Target] # ./test_jpeg_classification resnet18_pruned_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
        WARNING: Logging before InitGoogleLogging() is written to STDERR
        I0709 10:35:45.505468 1506839 demo.hpp:1193] batch: 0     image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
        I0709 10:35:45.505685 1506839 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.975592
        I0709 10:35:45.506140 1506839 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 0.0122809
        I0709 10:35:45.506359 1506839 process_result.hpp:24] r.index 242 boxer, r.score 0.0011423
        I0709 10:35:45.506470 1506839 process_result.hpp:24] r.index 285 Egyptian cat, r.score 0.00100808
        I0709 10:35:45.506603 1506839 process_result.hpp:24] r.index 245 French bulldog, r.score 0.000889625

●[Target] # vaitrace -h

[Target] # vaitrace -h
usage: vaitrace [-h] [-c [CONFIG]] [-d] [-o [TRACESAVETO]] [-t [TIMEOUT]] [-v] [-b] [-p] [--va] [--xat] [--txt_summary] [--json_summary] [--fine_grained] ...

positional arguments:
cmd

optional arguments:
-h, --help        show this help message and exit
-c [CONFIG]       Specify the config file
-d                Enable debug
-o [TRACESAVETO]  Save report to, only available for txt summary mode
-t [TIMEOUT]      Tracing time limit in second, default value is 60
-v                Show version
-b                Bypass vaitrace, just run command
-p                Trace python application
--va              Generate trace data for Vitis Analyzer
--xat             Save raw data, for debug usage
--txt_summary     Display txt summary
--json_summary    Display json summary
--fine_grained    Fine grained mode

●モデルの分析

・resnet18_ptの場合

[Target] # vaitrace --txt_summary ./test_jpeg_classification resnet18_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    INFO:root:VART will run xmodel in [NORMAL] mode
    INFO:root:Executable file: /home/root/Vitis-AI/examples/vai_library/samples/classification/test_jpeg_classification
    Analyzing symbol tables...
    161 / 161
    81 / 81
    6 / 6
    45 / 45
    3 / 3
    WARNING: Logging before InitGoogleLogging() is written to STDERR
    I0709 11:01:19.742046 1518158 demo.hpp:1193] batch: 0     image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    I0709 11:01:19.742246 1518158 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.999475
    I0709 11:01:19.742642 1518158 process_result.hpp:24] r.index 287 lynx, catamount, r.score 0.000430517
    I0709 11:01:19.742765 1518158 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 4.53761e-05
    I0709 11:01:19.742966 1518158 process_result.hpp:24] r.index 291 lion, king of beasts, Panthera leo, r.score 1.30005e-05
    I0709 11:01:19.743139 1518158 process_result.hpp:24] r.index 282 tiger cat, r.score 1.01248e-05

    APM Stop Collecting
    INFO:root:Generating ascii-table summary
    INFO:root:Processing xmodel information
    DPU Summary:
    ==========================================================================================================================================
    DPU Id      | Bat | DPU SubGraph                                       | WL    | SW_RT | HW_RT | Effic | LdWB   | LdFM  | StFM  | AvgBw
    ------------+-----+----------------------------------------------------+-------+-------+-------+-------+--------+-------+-------+---------
    DPUCZDX8G_1 | 1   | ResNet__ResNet_AdaptiveAvgPool2d_avgpool__4042_fix | 3.634 | 4.203 | 4.085 | 72.4  | 11.143 | 0.792 | 0.192 | 2968.838
    ==========================================================================================================================================

    Notes:
    "~0": Value is close to 0, Within range of (0, 0.001)
    Bat: Batch size of the DPU instance
    WL(Work Load): Computation workload (MAC indicates two operations), unit is GOP
    SW_RT(Software Run time): The execution time calculate by software in milliseconds, unit is ms
    HW_RT(Hareware Run time): The execution time from hareware operation in milliseconds, unit is ms
    Effic(Efficiency): The DPU actual performance divided by peak theoretical performance,unit is %
    Perf(Performance): The DPU performance in unit of GOP per second, unit is GOP/s
    LdFM(Load Size of Feature Map): External memory load size of feature map, unit is MB
    LdWB(Load Size of Weight and Bias): External memory load size of bias and weight, unit is MB
    StFM(Store Size of Feature Map): External memory store size of feature map, unit is MB
    AvgBw(Average bandwidth): External memory average bandwidth. unit is MB/s
    ....


    CPU Functions(Not in Graph, e.g.: pre/post-processing, vai-runtime):
    ===========================================================================
    Function                               | Device | Runs | AverageRunTime(ms)
    ---------------------------------------+--------+------+-------------------
    cv::imread                             | CPU    | 1    | 4.675
    cv::resize                             | CPU    | 1    | 0.410
    xir::XrtCu::run                        | CPU    | 1    | 4.185
    vitis::ai::DpuTaskImp::run             | CPU    | 1    | 4.395
    vitis::ai::ConfigurableDpuTaskImp::run | CPU    | 1    | 4.408
    vitis::ai::ClassificationImp::run_2    | CPU    | 1    | 5.985
    ===========================================================================

・resnet18_pruned_ptの場合

[Target] # vaitrace --txt_summary ./test_jpeg_classification resnet18_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    WARNING: Logging before InitGoogleLogging() is written to STDERR
    I0709 10:35:01.134765 1506540 demo.hpp:1193] batch: 0     image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    I0709 10:35:01.134989 1506540 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.999475
    I0709 10:35:01.135370 1506540 process_result.hpp:24] r.index 287 lynx, catamount, r.score 0.000430517
    I0709 10:35:01.135494 1506540 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 4.53761e-05
    I0709 10:35:01.135692 1506540 process_result.hpp:24] r.index 291 lion, king of beasts, Panthera leo, r.score 1.30005e-05
    I0709 10:35:01.135864 1506540 process_result.hpp:24] r.index 282 tiger cat, r.score 1.01248e-05

    [Target] # vaitrace --txt_summary ./test_jpeg_classification resnet18_pruned_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    vaitrace --txt_summary ./test_jpeg_classification resnet18_pruned_pt ~/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    INFO:root:VART will run xmodel in [NORMAL] mode
    INFO:root:Executable file: /home/root/Vitis-AI/examples/vai_library/samples/classification/test_jpeg_classification
    Analyzing symbol tables...
    161 / 161
    81 / 81
    6 / 6
    45 / 45
    3 / 3
    WARNING: Logging before InitGoogleLogging() is written to STDERR
    I0709 11:03:35.327162 1520058 demo.hpp:1193] batch: 0     image: /home/root/Vitis-AI/examples/vai_library/samples/classification/images/003.JPEG
    I0709 11:03:35.327795 1520058 process_result.hpp:24] r.index 286 cougar, puma, catamount, mountain lion, painter, panther, Felis concolor, r.score 0.975592
    I0709 11:03:35.328372 1520058 process_result.hpp:24] r.index 290 jaguar, panther, Panthera onca, Felis onca, r.score 0.0122809
    I0709 11:03:35.328680 1520058 process_result.hpp:24] r.index 242 boxer, r.score 0.0011423
    I0709 11:03:35.328869 1520058 process_result.hpp:24] r.index 285 Egyptian cat, r.score 0.00100808
    I0709 11:03:35.329085 1520058 process_result.hpp:24] r.index 245 French bulldog, r.score 0.000889625

    APM Stop Collecting
    INFO:root:Generating ascii-table summary
    INFO:root:Processing xmodel information
    DPU Summary:
    =========================================================================================================================================
    DPU Id      | Bat | DPU SubGraph                                       | WL    | SW_RT | HW_RT | Effic | LdWB  | LdFM  | StFM  | AvgBw
    ------------+-----+----------------------------------------------------+-------+-------+-------+-------+-------+-------+-------+---------
    DPUCZDX8G_1 | 1   | ResNet__ResNet_AdaptiveAvgPool2d_avgpool__4042_fix | 1.814 | 2.645 | 2.534 | 58.3  | 5.296 | 0.703 | 0.133 | 2419.460
    =========================================================================================================================================

    Notes:
    "~0": Value is close to 0, Within range of (0, 0.001)
    Bat: Batch size of the DPU instance
    WL(Work Load): Computation workload (MAC indicates two operations), unit is GOP
    SW_RT(Software Run time): The execution time calculate by software in milliseconds, unit is ms
    HW_RT(Hareware Run time): The execution time from hareware operation in milliseconds, unit is ms
    Effic(Efficiency): The DPU actual performance divided by peak theoretical performance,unit is %
    Perf(Performance): The DPU performance in unit of GOP per second, unit is GOP/s
    LdFM(Load Size of Feature Map): External memory load size of feature map, unit is MB
    LdWB(Load Size of Weight and Bias): External memory load size of bias and weight, unit is MB
    StFM(Store Size of Feature Map): External memory store size of feature map, unit is MB
    AvgBw(Average bandwidth): External memory average bandwidth. unit is MB/s
    ....


    CPU Functions(Not in Graph, e.g.: pre/post-processing, vai-runtime):
    ===========================================================================
    Function                               | Device | Runs | AverageRunTime(ms)
    ---------------------------------------+--------+------+-------------------
    cv::imread                             | CPU    | 1    | 4.646
    cv::resize                             | CPU    | 1    | 0.404
    xir::XrtCu::run                        | CPU    | 1    | 2.628
    vitis::ai::DpuTaskImp::run             | CPU    | 1    | 2.835
    vitis::ai::ConfigurableDpuTaskImp::run | CPU    | 1    | 2.847
    vitis::ai::ClassificationImp::run_2    | CPU    | 1    | 4.506
    ===========================================================================

●XMODEL生成のために筆者のPCではこの変更が必要だった（run_ptq.sh）

[Docker]$ diff -Nur ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_ptq.sh script/
run_ptq.sh
--- ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_ptq.sh
+++ script/run_ptq.sh
@@ -1,5 +1,5 @@
# Calibration
-python validate_quant.py  --quantized_out quantize_result/ptq --quant_mode calib 
--data-dir $2 -b 32 --model $1 --subset_len 5120
+python validate_quant.py  --quantized_out quantize_result/ptq --quant_mode calib 
--data-dir $2 -b 32 --model $1 --subset_len 200
# Test
python validate_quant.py  --quantized_out quantize_result/ptq --quant_mode test  
--data-dir $2 -b 32 --model $1
# Deployment

●筆者の環境ではPCのリソースの関係で変更が必要だった（run_fast_finetune.sh）

[Docker]$ diff -Nur ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_fast_finetune.sh script/run_fast_finetune.sh
--- ../../src/vai_quantizer/vai_q_pytorch/example/timm/script/run_fast_finetune.sh 
+++ script/run_fast_finetune.sh
@@ -1,5 +1,5 @@
# Calibration with Fast_finetune
-python validate_quant.py --quantized_out quantize_result/fast_finetune --quant_mode calib 
--data-dir $2 -b 32 --model $1 --fast_finetune --ff_subset_len 1024 --subset_len 5120
+python validate_quant.py --quantized_out quantize_result/fast_finetune --quant_mode calib 
--data-dir $2 -b 32 --model $1 --fast_finetune --ff_subset_len 40 --subset_len 200
# Test with Fast_finetune
python validate_quant.py --quantized_out quantize_result/fast_finetune --quant_mode test  
--data-dir $2 -b 32 --model $1 --fast_finetune
# Deployment with Fast_finetune

2026年4月
月	火	水	木	金	土	日
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

●リスト1

●[Docker]$ python resnet18_pruning.py –help コマンドの実行結果

●[Docker]$ cd Quantizer [Docker]$ python resnet18_quant.py -h コマンドの実行結果

●[Docker]$ $ python validate_quant.py -h コマンドの実行結果

●リスト24の全部

（a）resnet18_pt

（b）resnet18_pruned_pt

●[Target] # vaitrace -h

●モデルの分析

・resnet18_ptの場合

・resnet18_pruned_ptの場合

●XMODEL生成のために筆者のPCではこの変更が必要だった（run_ptq.sh）

●筆者の環境ではPCのリソースの関係で変更が必要だった（run_fast_finetune.sh）

コメントを残す コメントをキャンセル

●[Docker]$ python resnet18_pruning.py –help　コマンドの実行結果

●[Docker]$ cd Quantizer
　[Docker]$ python resnet18_quant.py -h　コマンドの実行結果

●[Docker]$ $ python validate_quant.py -h　コマンドの実行結果

コメントを残すコメントをキャンセル