System Architecture of Embedded Systems

1. General situation

2. Frequent issues

3. Suitable combination of buses and processing units

4. Inside FPGAs and FPGA-SoCs

1. General situation

It is interesting to note that many industrial research and development sectors are facing similar technical challenges regarding embedded systems, but there are differences in their handling and internal connections (buses). While the automotive industry, for example, is striving for standardization and is thus already addressing the requirement for reliability, some sectors of the electrical industry have a brisk diversity of technical implementations. This diversity does not mean a variety of in-house developments of bus systems. Of course, well-known standards are used. Rather, it refers to the combination of standardized bus systems used between the processing units of an embedded system. This often results in bus topologies whose sub-bus systems are not connected by bridges.

Also the use of bus systems like the USB (Universal Serial Bus), which is mostly used in consumer electronics, is often found inside embedded systems. The reasons for this proliferation are, among others, the provision of many sensors for access via USB and the relatively inexpensive possibility of connecting several so-called downstream ports to one upstream port by USB hub controllers. It is precisely this combination of supposed advantages that contributes to the quick development of prototypes which function in principle, but often leads to problems in series production.

2. Frequent issues

Among other issues, difficulties are recurring in the following areas.

Reliability and signal integrity

At this point the situation from the upper section regarding a USB based system architecture can be addressed. In this case, if, for example, several signal or image sensors are connected via USB to a host system via USB hub controllers, the following deficits are often encountered:

The reliability of the function of a device varies significantly in series production over several batches or even individual devices. This becomes noticeable when individual USB devices are no longer correctly recognized by the host system, or when an established communication is interrupted during operation.
If the host system is replaced due to discontinuation or other reasons, the functional stability of the overall system may be affected. In an adverse case, the stability may deteriorate. This is due to the different USB host controllers, which can differ fundamentally even in related product families and affect industrial and consumer host systems equally. The cause of this deficit is also not only related to the replaced host system. It is rather the combination of the components of USB devices, USB controllers, USB cables, USB hub controllers, USB repeaters, cable lengths and the host system (including operating system, USB drivers and chipset settings) that makes the reliability of the device fluctuate so much over the operating state and device variations. This should be taken into account when developing such a system. The subsequent correction of these deficiencies may require a high effort and can already be avoided during the design of the system architecture.

Microcontroller as central point for data communication

It can often be observed that microcontrollers (µC) and their programs are used as central elements not only of data processing but also of the data path chain. Especially the latter use results in the data transfer, from higher hierarchical levels of the embedded system to lower ones or to end points like sensors or actuators, being handled by software. Thus the following deficits can establish themselves in the overall system:

The data transfer across the layers of the embedded system behaves non-deterministically.
The real-time capability of a complete system can be reduced.
The reliability and stability of the overall system may be reduced.
The data transfer rate of a data path containing the µC can be limited by the µC.

In order to anticipate a suitable solution from the lower part of this article, it makes sense to rely on an ASIC-based bus extension of the embedded system or to use FPGAs, FPGA SoCs as central or extending data communication elements to resolve this situation.

Lack of adequate bus bridges

A deficit of many embedded systems that should not be underestimated is the lack of adequate bus bridges that connect the selected buses, even across different standards, and thus enable a direct translation possibility of bus transfers from one bus to another. In development, this often affects embedded systems based on µC, FPGAs and FPGA-SoCs to the same extent, but this does not have to be the case when using these technologies. If, on contrary, more use is made of purchased solutions such as single-board computers, the deficit is largely mitigated by the implicitly given sub-architecture. The lack of adequate bus bridges can lead to the following deficits in the overall system:

Register accesses to bus nodes are often provided in the processing units of the embedded system. This means that additional program code or additional logic is required to artificially implement the data transfer function, so to speak.
Useful memory mapping functions cannot be implemented directly.
Provision of memory access across two layers of the embedded system can be difficult.
Direct memory access (DMA) across two layers of the embedded system can be difficult.
The data throughput and the latency of the entire system can be significantly affected.

3. Suitable combination of buses and processing units

A simple and proven approach

In order to design a suitable system architecture for an embedded system, a proven approach is to think from the outside in, so to speak. This means to start with the external requirements of the overall system moving inwards to reach the detailed specification of the embedded system, related to the following coarse steps:

The first step is to list the desired interaction possibilities of the entire system with the environment, for example with sensors, actuators, displays and interfaces.
An analysis of the operational requirements such as throughput, latency, modes of operation, cost and other requirements of the elements identified in point 1 can then be made.
Now it is possible to select the appropriate processing units of the embedded system. This is done taking into account the usable interfaces of the elements from point 1 and should meet the requirements from point 2.
If the necessary processing units are known, the selection of the appropriate bus systems between them can now be made, if these are not already predefined by the processing units. This point can be in feedback with point 3.
Last but not least, the design of the system architecture with respect to the embedded software and programmable logic in FPGAs or FPGA SoCs can now be performed. These system architectures are parts of the overall system and should not be underestimated. In principle, it is again possible to proceed in an adapted manner as described in points 1 to 4.

The approach shown is of course only a coarse draft of a possible procedure and has to be adapted in particular.

Processing units

The choice of processing units is of course one of the most interesting tasks during the design of an embedded system architecture. Here, the performance, flexibility, but also the size and cost of the embedded system can be significantly influenced. Since the selection of suitable processing units depends on the specific requirements, there can be no general recommendation for a specific processing unit. However, Table 1 provides a coarse overview of frequently used processing units in embedded systems with some of their characteristics (evaluated relatively to each other). Knowledge of these properties enables you to make a better selection.


	Main data processing principles	Data throughput	Latency	Deterministic behavior	Cost estimation	Energy efficiency (processing power per energy input)	Special feature
µC	Software	low	low-medium	low-medium	low	medium-high	cost-effective for simple processing
DSP	Software	low-medium	low-medium	high-very high	low	medium-high	digital signal processing at full throughput
GPU	Hardware, Software	high	medium	medium-very high	medium	medium-high	can be integrated into SoC
Single-board computer SoC	Software	medium-high	medium-high	very low	medium	low-medium	may include GPU
Single-board computer Embedded Computer	Software	high	high	very low	mittel-hoch	low-medium	may include GPU
FPGA	Hardware	high-very high	very low	very high	medium-very high	high	customized processing
FPGA-SoC	Hardware, Software	very high	very low	very high	medium-very high	high	may include GPU
ACAP	Hardware, Software	very high	very low	very high	medium-very high	high	may include GPU
Individual ASIC	Hardware	very high	very low	very high	highly quantity dependent	very high	highly customized processing


Abbreviations
µC	Microcontroller	FPGA	Field Programmable Gate Array
DSP	Digital Signal Processor	FPGA-SoC	Combination of a SoC with FPGA
GPU	Graphics processing unit	ASIC	Application-specific integrated circuit
SoC	System-on-Chip	ACAP	Adaptive Compute Acceleration Platform
CPU	Central Processing Unit

Table 1: Overview of common processing units in embedded systems and their characteristics

Only a coarse estimate of the average rating can be given in Table 1, as the characteristics of some processing units can overlap significantly and depend on many conditions, or there are special products per class which have better characteristics.

In the next section, a few commonly used processing units will be explained in more detail:

ASICs

Custom ASIC solutions are rarely used in ordinary embedded systems. However, if they are necessary, they usually affect core functions of the overall system, which thus becomes highly individual and powerful.

Microcontroller (µC)

Microcontrollers are a very popular element in embedded systems. They are cost-effective and usually perform elementary data processing and data transfer tasks. Unfortunately, they often significantly limit the performance and potential of the overall system (see point 2). This is noticeable, among other things, by the fact that subsequent functional extensions of the system cannot be implemented at all or only with a great amount of effort.

Single-board Computer

Single-board computers are an extremely efficient and cost-effective class of processing units. Well-known examples are the Raspberry Pi or the NVIDIA Jetson systems. The advantages of these systems are their versatile SoCs. These single-board computers can thus cover applications such as video processing with video encoding or video decoding, DSP functions, operating system functions but also accelerated machine learning functions. The bus systems available in the SoC and on the board are optimized for the combination of the processing units. In contrast, a weak point of single-board computers is the throughput and latency for high-end applications.

FPGAs and FPGA-SoCs

If you need to implement user-specific algorithms with high throughput and low latency, FPGAs are the excellent choice. Depending on the quantity, they are less expensive than ASICs and their function (also known as configuration) can be modified afterwards. In combination with DDR memories and suitable access to USB or PCIe interfaces, they can, in addition to high-performance data processing, also perform the data transport functions of an embedded system (see point 4). If you want to integrate additional powerful software functions or operating systems, the FPGA-SoCs are interesting, which combine classical FPGAs with processor systems.

Adaptive Compute Acceleration Platform (ACAP)

ACAPs are a further development of FPGA SoCs and represent a highly flexible acceleration platform for various types of calculations. ACAPs combine the SoC software functions, an area with optimized programmable logic (similar to classic FPGAs) with a special engine for high-performance DSP functions and machine learning functions. All units are additionally interconnected by a Network-on-Chip (NoC) in order to make the data transfer between the units sufficiently performant.

4. Inside FPGAs and FPGA-SoCs

Just like for software, the same applies to FPGA or SoC FPGAs: the system architecture within these components should not be neglected. If this is taken into account, more flexible and powerful systems can be created without the need for higher development efforts or development costs. FPGAs are no longer just “glue logic” for connecting ADCs, DACs, or other peripherals with special interface properties and, when used, can provide the entire backbone for data transportation inside an embedded system. To illustrate this potential, Figure 1 shows an exemplary embedded system and its components, as well as the architecture inside the used FPGA 1.

It can be seen that the FPGA 1 of the embedded system 1 not only performs the data preprocessing for the peripheral components such as image sensors, ADCs or DACs, but additionally stores the processed result data in an external DDR memory. For this purpose the FPGA 1 uses an internal bus system, which can be highly individual and has a significant influence on the performance of the FPGA-internal system architecture.

An external µC or external SoC is not integrated into the high performance and low latency data processing and data transfers of the FPGA. With their flexible software-based data processing, these components can focus on the control and adjustment of all components of embedded systems 1 and 2 and also have access to the entire DDR memory. The access to all embedded systems is made possible by the fact that the internal bus systems of all embedded systems are interconnected by bridges. This is possible in a special way even between different FPGAs, as Figure 1 shows through the FPGA bridge between FPGA 1 and FPGA 2.

Furthermore, Figure 1 shows that the host system (e.g.: a PC) is connected to the embedded systems via a dedicated bus system (like USB, PCIe or Ethernet) using the FPGA. Here USB can be used, because it is a direct connection of only one hop without additional USB hubs. This connection can be developed independently due to the bridge connection to the internal bus system and thus only slightly influences the development of the embedded system. The bridge connection also gives the host system full access to all embedded systems. If this should not be desired, a restriction should not be made by removing bridges, but for example by restricting bus arbitration or by locking memory regions.

FPGAs can be used in the way shown in Figure 1 to optimally complement embedded systems. As usual, the FPGAs additionally take over the individual preprocessing of sensor data or implement customized data processing and thus fulfill their task as hardware accelerators.