The aiWare3P IP core incorporates a range of new features that result in significantly improved performance, lower power consumption, greater host CPU offload and simpler layout for larger chip designs.
“Our production-ready aiWare3P release brings together everything we know about accelerating neural networks for vision-based automotive AI inference applications;” said Marton Feher, senior vice president of hardware engineering for AImotive.
“We now have one of the automotive industry’s most efficient and compelling NN acceleration solutions for volume production L2/L2+/L3 AI. When complemented by AImotive’s significant algorithmic, safety and production expertise for automated driving, we believe we offer our customers the most technology-rich automotive-focused solutions available today”.
Each aiWare3P hardware IP core offers up to 16 TMAC/s (>32 TOPS) at 2GHz, with multi-core and multi-chip implementations capable of delivering up to 50+ TMAC/s (>100 INT8 TOPS) – ideal for multi-camera or heterogeneous sensor-rich applications. The core is designed for AEC-Q100 extended temperature operation, and includes a range of features to enable users to achieve ASIL-B and above certification.
Key upgrades include:
- Higher efficiency for wider range of NN functions due to improved on-chip data reuse and movement, more sophisticated scheduling algorithms and upgraded external memory bandwidth management
- Support for a much larger portfolio of pre-optimised embedded activation and pooling functions, ensuring that 100% of most NNs execute within the aiWare3P core without any host CPU intervention
- Real-time data compression, further reducing external memory bandwidth requirements – especially for larger input sizes and deeper networks
- Advanced cross-coupling between C-LAM convolution engines and F-LAM function engines, to further increase overlapped and interleaved execution efficiency
- Physical tile-based microarchitecture, enabling much easier physical implementation of large aiWare cores by minimizing difficult timing constraints on any process node
- Logical tile-based data management, enabling efficient workload scalability up to the maximum 16 TMAC/s per core, without the need for caches, NOCs or other complex multi-core processor-based approaches that create bottlenecks, reduce determinism and consume more power and silicon area
- Significantly upgraded SDK, including improved compiler and new performance analysis tools for both offline estimation and real-time fine-grained target hardware analysis
The aiWare3P hardware IP is being deployed in a range of L2/L2+ production solutions, as well as being adopted for studies of more advanced heterogeneous sensor applications. Customers include Nextchip for their forthcoming Apache5 Imaging Edge Processor, and ON Semiconductor for their collaborative project with AImotive to demonstrate advanced heterogeneous sensor fusion capabilities.
As part of their commitment to open benchmarking using well-controlled benchmarks reflecting real applications such as high resolution inputs for cameras rather than unrealistic public benchmarks using 224x224 inputs, AImotive will be releasing a full update to their public benchmark results in Q1 2020 based on the aiWare3P IP core.
The aiWare3P RTL will be shipping to all customers from January 2020.