tvm
首发于tvm

TVM开发报告-2019年11-12月

2019年11月份开发报告

社区发展

社区在湾区上海分别举办了两场meetups,可以在这里找到相关的幻灯片

去社区加入了新的Reviewer Logan Weber (@weberlo).

社区讨论论坛( discuss.tvm.ai )继续健康成长,在10月累计页面浏览量10万3千余次,独立用户访问量3300余次。

新特性和改进

过去一个月,社区在很多方面取得了进展,以下是一些值得关注的功能和改进:

  • 增加了对NVIDIA TensorCore的支持,达到了可以与原生库(cuBLAS, cuDNN)相媲美的性能。(#4105, #4353)
  • 为嵌入式设备实现的C++ RPC server,可以在无python运行时环境下运行without python runtime (#4281)
  • 在Relay中添加了对内存和张量的显式分配。 (#3560)
  • 算子的性能提升。(reduction ops #4158, batch matmul #4242)

除此之外,我们还成功release了进入apache孵化器之后的第一个版本 Apache TVM (incubating) v0.6.

更多细节可见下表。

Compiler and VM Improvement

Quantization

Performance

Operator Support

User Interface and Frontend

Language, Runtime and Hardware Support

Documents, Test and Build

Bugfix

People Whose Pull Requests are Updated:

Note: The format is name(number of activities, area list). Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions

anijain2305 (36), tqchen (18), sgrechanik-h (18), yzhliu (15), icemelon9 (11), FrozenGene (11), kevinthesun (8), co maniac (8), vinx13 (7), wweic (7), tmoreau89 (7), yongwww (7), cchung100m (7), liangfu (6), shoubhik (6), zhiics (5 ), eqy (5), t-vi (5), hcho3 (5), merrymercy (4), jroesch (4), apivovarov (4), hlu1 (4), alexgl-github (4), MarisaKi risame (3), srkreddy1238 (3), Laurawly (3), soiferj (3), jwfromm (3), petrex (3), jdavies-huawei (3), ZihengJiang ( 2), were (2), huajsj (2), inadob (2), zxy844288792 (2), csarofeen (2), makihiro (2), vmiheer (2), hgt312 (2), KimBi oInfoStudio (2), siju-samuel (1), masahi (1), nhynes (1), Huyuwei (1), vegaluisjose (1), lixiaoquan (1), weberlo (1 ), junrushao1994 (1), antinucleon (1), liangdzou (1), cbalint13 (1), imorinaga (1), yuruofeifei (1), u99127 (1), ki mishpatel (1), gemfield (1), Rasterer (1), tristan-arm (1), Hzfengsy (1), kice (1), jackwish (1), liaha (1), paddyh oran (1), bindog (1), jmorrill (1), mbarrett97 (1), ariwaranosai (1), ic (1), minminsun (1), lsy643 (1), tweej (1), trevor-m (1), XFPlus (1), abuccts (1), autumnqin (1), ekalda (1), jason-song-dev (1), PeikeLi (1), gittripley (1), zhuochenKIDD (1), ziyu-guo (1)

People Who Reviewed Pull Requests:

Note: The format is name(number of activities).

tqchen (139), zhiics (62), yzhliu (55), vinx13 (28), kevinthesun (26), FrozenGene (23), anijain2305 (20), tmoreau89 (20), merrymercy (18), icemelon9 (18), masahi (18), yongwww (18), jackwish (17), ZihengJiang (16), wweic (16), jro esch (14), soiferj (13), junrushao1994 (13), MarisaKirisame (12), Laurawly (8), ajtulloch (8), comaniac (8), u99127 (8), srkreddy1238 (7), kazum (7), liangfu (7), shoubhik (6), vegaluisjose (5), weberlo (5), cchung100m (5), jwfrom m (5), eqy (4), apivovarov (4), slyubomirsky (4), yidawang (4), cbalint13 (4), grwlf (4), broune (4), huajsj (3), p etrex (3), Huyuwei (2), antinucleon (2), xqdan (2), derisavi (2), Hzfengsy (2), reminisce (2), KimBioInfoStudio (2) , minminsun (2), siju-samuel (1), nhynes (1), PariksheetPinjari909 (1), mshawcroft (1), zhreshold (1), sgrechanik-h (1), t-vi (1), hcho3 (1), adityaatluri (1), denis0x0D (1), yinghai (1), altanh (1), umangyadav (1), lly-zero-one ( 1), kaitingwang (1), SWu (1), TaoLv (1), ZhennanQin (1), jmorrill (1), Leo-arm (1), zhuochenKIDD (1)

2019年12月份开发报告

社区发展

在西雅图的华盛顿大学举行了第二届的TVM Conference;会议上包括了来自社区各个公司的演讲 (包括有 AWS, Facebook, Alibaba, Cornell, Microsoft, ARM, Xilinx, OctoML, Qualcomm, Stanford, Intel). 视频和幻灯片可以在这里找到:sampl.cs.washington.edu

同时,社区加入了新的PPMC成员Jared Roesch (@jroesch) 以及新的Reviewer Neo Chien (@cchung100m).

社区讨论论坛( discuss.tvm.ai )在10月累计页面浏览量9万余次,独立用户访问量2700余次。比11月份数据稍低,应该是美国长假季节的原因。

新特性和改进

在上个月,社区发布了TVM v0.6,该版本已全面弃用NNVM以支持Relay,在Relay和Relay VM中实现了bring your own code-gen,实现了标准化graph module导出,并扩展了uTVM以支持其第一个微控制器平台,即 ARM STM32F746XX。 进行了unified object system的代码库重构,添加了包括3D运算符在内的多个运算符,改善了INT8 GEMM性能,并为ROCM和ARM NHWC添加了新的schedule。 TF到Relay,TFLite到Relay和ONNX到Relay的运算符覆盖面均有所增加。 方便的Layout Transformation过程已添加到Relay。 扩展了RPC运行时,以支持在低功耗设备上进行TFlite模型评估。 对cycle-accurate TSIM仿真器添加了一些增强功能。 以及,整个社区进行了许多bug fixes。

更多细节可见下表。

Compiler Support

  • Add function attributes to IR hash (#4479)
  • Intrinsic dispatching with OCML instead of LLVM for ROCm (#4499)
  • IR readability enhancement (#4501)
  • Add bfloat16 typeflag support (#4525)
  • External codegen support in Relay (#4482) + VM (#4544)
  • Deprecating NNVM (#4535, #4562, #4565, #4571)
  • Cythonize NDArray.copyto (#4549)
  • Add convertlayout pass in Relay (#4335, #4600)
  • Relay passes lookup overhead optimization (#4594)
  • Unified Object System runtime refactor (#4578, #4581, #4603)
  • VM profiler: sort VM stats by time (#4601)

Operator Support and AutoTVM

  • Add strided_set operation (#4303)
  • Add shape function for zero, zeros_like, ones, ones_like (#4448), tile (#4441)
  • Add support for conv3d (#4400), pool3d (#4478), 3d upsampling ops (#4584)
  • Add group convolution for VTA (#4421)
  • Adding ROCM schedules for TOPI (#4507)
  • Add 1d deconvolution op (#4476)
  • Allow batch matmul to be fused into injective ops (#4537)
  • Add native depthtospace and spacetodepth operators (#4566)
  • NHWC conv2d schedule templates for ARM (#3859)
  • Int8 GEMM performance enhancement using Cublas (#4550)

User Interface and Frontend

  • TFLite parser support for transpose_conv (#4440), unpack (#4447)
  • LLDB pretty printers for relay (#4453)
  • ONNX to Relay converter op support: expand op (#4483), auto_pad in conv and convtranspose (#4563)
  • TF to Relay converter op support: bilinear and neighbour implementation refactor (#4504), max_pool3d (#4551), conv2d_transpose with “same” padding support for larger than 1x1 kernels
  • Remove unnecessary cast of constants in ONNX converter (#4573)

Runtime

  • Add ADTObject POD container type (#4346)
  • Add CUDNN conv3d support (#4418)
  • Update RPC runtime to allow remote module as arg (#4462)
  • TFLite RPC runtime (#4439)
  • Refactorying system lib and dso lib into library module (#4481)
  • Standardized graph runtime export (#4532)

Documents, Test, and Build

  • Adding benchmark log format doc (#4366)
  • Adding AMD codegen unit tests (#4509)
  • Add Ninja build system to installation docs (#4554)
  • Add v0.6 release (#4558)

Accelerator and Microcontroller Support

  • uTVM support for ARM STM32F746XX boards (#4274)
  • Speedup TSIM with multi-threading (#4491)
  • Improve TSIM virtual memory mapping (#4545)
  • Cleanup legacy verilog code (#4576)

Fixes

  • Doc/comment fixes (#4452, #4463, #4469, #4493, #4397, #4580, #4585, #4591)
  • MSVC / Windows fixes (#4455, #4569)
  • Fix Makefile for howto_deploy (#4457)
  • Fix GCC 4.8 compact (#4461)
  • Fix search path to build libtvm_topi.so (#4467)
  • Fix for conv2d_transpose CUDA compilation (#4472)
  • Fix for LLVM 10.0 codegen (#4480, #4515)
  • Fix alter op layout when calling global var (#4454)
  • Fix float2half_rn support for cuda compute capabilities < 53 (#4489)
  • Fix compile errors for OpenCL backends (#4492)
  • Fix serialization precision loss (#4503)
  • Fix hybrid script to support array of tensors (#4494)
  • Fix annotation for multiply op (#4458)
  • Fix Dockerfile for linter CI (#4506)
  • Fix TF resize for dynamic size models (#4510)
  • Fix bias_add gradient (#4516)
  • Fix tanH unit test function call (#4517)
  • Fix extra reshape parameter for ONNX (#4524)
  • Fix crash caused by empty TOPI config (#4520)
  • Fix ONNX shape op type to use int64 (#4528)
  • Fix crash in TSIM virtual memory driver (#4527)
  • Replace deprecated python library in setup script (#4533)
  • Fix NMS max_output_size loop (#4541)
  • Fix style in IR mutator and IR visitor (#4561)
  • Fix compiler warning (#4559)
  • Fix to get end to end inference on Chisel VTA (#4574)
  • Fix LLVM build by adding missing intrinsics headers (#4575)
  • Fix context creation in quantization (#4582)
  • Fix NDArray SaveDLTensor signature (#4586)
  • Fix dense pack schedule for x86 (#4539)
  • Fix for broadcast tensor of scalar type (#4577)
  • Datatype refactor (#4513, #4560)
  • Add const qualifiers for NDArray container (#4590)
  • Fix TF <= 1.12 compatibility (#4593)
  • Fix for graph debug runtime (#4598)
  • Disable copy constructor for external codegen (#4597)
  • Make ADT tag signed (#4605)

People Who Reviewed Pull Requests:

Note: The format is name(number of activities) Disclaimer: number of activities do not directly correspond to the community’s view about the significance of contributions.

tqchen (22), zhiics (7), icemelon9 (5), masahi (5), yongwww (5), liangfu (5), inadob (5), anijain2305 (4), apivovarov (4), liangdzou (4), optima2005 (4), kevinthesun (3), FrozenGene (3), cchung100m (3), petrex (3), yzhliu (2), wweic (2), eqy (2), abergeron (2), jwfromm (2), BenjaminTu (2), kice (2), mbarrett97 (2), dmakarov (2), wyc-ruiker (2), jmorrill (2), zhuochenKIDD (2), ZihengJiang (1), MarisaKirisame (1), Laurawly (1), tmoreau89 (1), nhynes (1), kazum (1), jroesch (1), slyubomirsky (1), soiferj (1), weberlo (1), junrushao1994 (1), comaniac (1), t-vi (1), u99127 (1), kimishpatel (1), alexgl-github (1), Hzfengsy (1), vmiheer (1), reminisce (1), jackwish (1), spectrometerHBH (1), JammyZhou (1), SWu (1), lhutton1 (1), cylinbao (1), imharrywu (1), uenoku (1), HisiFish (1), leandron (1), Leo-arm (1), aksarben09 (1), tkclimb (1), abuccts (1), ElaineBao (1), qingyunqu (1), anwang2009 (1), KnowingNothing (1)

People Whose Pull Requests are Updated:

Note: The format is name(number of activities, area list)

tqchen (54), zhiics (24), kevinthesun (21), masahi (18), ZihengJiang (15), yzhliu (13), vinx13 (11), tmoreau89 (11), junrushao1994 (11), comaniac (11), wweic (9), FrozenGene (9), icemelon9 (8), yongwww (7), cchung100m (6), jwfromm (6), MarisaKirisame (5), apivovarov (5), soiferj (5), u99127 (5), optima2005 (3), srkreddy1238 (2), anijain2305 (2), liangfu (2), kice (2), merrymercy (1), Laurawly (1), nhynes (1), PariksheetPinjari909 (1), jroesch (1), Huyuwei (1), slyubomirsky (1), vegaluisjose (1), were (1), ajtulloch (1), weberlo (1), antinucleon (1), petrex (1), ehsanmok (1), Hzfengsy (1), yinghai (1), umangyadav (1), jackwish (1), yuluny2 (1), TaoLv (1), qingyunqu (1)

发布于 2020-01-22

文章被以下专栏收录