Lão hóa, sự phức tạp và AI trong thiết kế tương tự
Experts at the Table: Semiconductor Engineering sat down to discuss abstraction in analog vs. digital, how analog circuits age, the growing role of AI, and why there is so much margin in analog designs, with Mo Faisal, president and CEO of Movellus; Hany Elhak, executive director of product management at Synopsys; Cedric Pujol, product manager at Keysight; and Pradeep Thiagarajan, principal product manager for custom IC verification at Siemens EDA. What follows are excerpts of that conversation. To view part one of this discussion, click here. Part two is here.
L-R: Synopsys’ Elhak; Movellus’ Faisal; Siemens’ Thiagarajan; Keysight’s Pujol.
SE: How much abstraction is possible with analog design?
Pujol: We need some abstraction because we cannot accelerate the runtime forever. And AI/ML needs to be trained on something. But if it’s bad data, you will get bad output. Also, with a level of abstraction, some of the knowledge goes away. And that knowledge is crucial because now, to be able to design properly, the margin that we used to have is gone. The frequency that we had before is way, way lower. So the problems keep increasing.
Thiagarajan: We’re also lowering the boundaries between analog and digital, particularly in the RF IC space. We’re getting into 6G and 7G, and we’re talking about wireless connectivity, wireless SoC chips, high-throughput on-die radios that are going to support Bluetooth and Wi-Fi applications. For example, think about a power amplifier. Now these Wi-Fi schemes are being driven by digital RF modulation schemes and processed by an analog signal. You’ve got the classic orthogonal frequency division multiplexing (OFDM). These digital RF schemes have to be processed by analog PAs, which have to now perform at a higher frequency. So you’ve got RF frequencies in the upper gigahertz that need to be modulated by very low tens of megahertz. You’ve got to design for it. And then you’ve got to verify it, because you’re going to have power spectral density measurement issues that will show spectral growth in areas due to nonlinear distortion. But you have to find a way to detect it first. So it’s a frequency-level analysis, like a harmonic balance, that has to coexist with a transient-level timestamp verification for the low baseband signals. That is a great example of future technologies with higher frequencies, but with a digital modulation scheme for designing your PA.
Pujol: And the power amplifiers might be on a totally different technology. They can be on SiC or GaN. The power in GaN is extremely easy to get in, but controlling it is difficult because there’s a lot of power inside. And when you mix all these things, that becomes pretty difficult. If you want to go into the modulated signal simulation in order to measure EVM (error vector magnitude), most of the models are time-domain. You don’t have frequency-domain models in the PDK. So if you want reliability on latency, it’s easy enough because you have plenty of tools to do this. That’s time-domain, so you can estimate your latency. You also can do that with EVM, but it’s more difficult because it’s frequency-domain. There are no models of reliability today that are frequency-domain, and that’s very difficult to get. So the only thing that you can do is estimation. And margin is very difficult to get, so we need a model that allows us to simulate pretty much everything. That means we need the tools, the PDKs, and the models that go with that.
SE: How do analog circuits age versus digital circuits? And how do they age in context with each other when they’re both in the same device?
Thiagarajan: That’s a dangerous question. If you ask an analog designer, they’re going to put as much conservativeness and fat into it as possible. You can provide the aging models, but then you have to make sure that it works well with things like self-heating. You put in a lot of margin to make sure that your channel doesn’t degrade on the transistor that you choose. So typically you will get a very conservative analog design to sustain through aging.
Faisal: Analog is almost always over-designed. The head of engineering at a company told me that he didn’t want to give a spec out for analog IP. He wanted just the IP he needed, not something that was over-designed out of the box. In terms of aging, it’s more about analog will have constant currents running through it. It’s likely you’ll never see a problem, though, because the analog designer over-designed it. Digital is different. If you have a big digital chip, not everything switches at the same time and at the same rate. You can have a part of the chip that runs at 4 GHz or whatever for five years, and another part of the chip that comes on only twice, for a second, in five years. That creates asymmetry in aging.
SE: And thermal variation, right?
Faisal: Yes, and hot spots. The asymmetry in aging is a bigger problem than just developing a transistor. In some cases they add mock traffic — not even real traffic — just to keep everything aging evenly, but that costs a lot in terms of overhead of everything.
Elhak: Aging for analog design has been analyzed and taken care of for mission-critical designs, but not all the mainstream designs. In automotive and mil/aero, aging analysis has been key, and these traditionally were all-analog designs. Aging has become fashionable today for two reasons. One is because digital designers started using it, so it became more mainstream. The other reason is advanced nodes. It’s not just important for these mission-critical applications. Advanced nodes age worse than mature nodes. So aging analysis became mainstream for other applications, as well. Even if I’m doing a consumer electronic device that will only live for a few years, if I’m doing it in advance nodes I still need to run aging analysis. Analog aging has been analyzed for many years. But the key point here, and what is different today, is that analyzing for aging is not enough. Models coming from the foundry are not always that accurate, and you cannot just build a hypothesis about how the device will behave in 15 years based on a model that was only tested for stress for few months. There is also a need for post-silicon aging analysis, which is done with silicon lifecycle management techniques such as inserting sensors in the chip to extract things like temperature, frequency, and other parameters. That can be used to correct for aging while the chip is in the field. So there is pre-silicon aging analysis that is now essential and needs to be done while you’re doing circuit simulation with an analog or in digital. And there is post-silicon lifecycle management, where you are measuring the impact of aging while the device is operating in the field and correcting for it using software and digital that controls the analog circuits.
SE: With digital you turn off parts that don’t work and you reroute the data. What happens with analog? Can you turn off part of the circuit?
Elhak: A traditional analog block that is purely analog, or big analog/small digital, is very hard to change or calibrate. But today, the analog blocks are designed with digital loops around them that can change every parameter in the circuit. If it’s an amplifier, then the gain can change, the phase can change, the noise can change. All of that is controlled by loops around that amplifier. If it’s a voltage-controlled oscillator, then the frequency of oscillation can be changed by these digital loops. These loops are controlled by digital circuits, which are controlled by firmware and software. Having the right sensors in the chip allows you to measure and then correct by changing the value of these parameters defined by the analyzer — even the basic ones like frequency and gain.
Pujol: It’s all about correcting the analog through the digital.
Faisal: Digital loops fix it and blur the boundaries, but under the hood it’s still a little bit of an over-design. When it comes to aging, that affects analog and digital differently. In digital, it will just slow down. If you haven’t accounted for it in analog, though, and your transistors goes out of bias, all of a sudden your amplifier becomes a resistive ladder. That’s drastic. You’re done.
SE: And if you drop the voltage even lower, that becomes a separate problem, right?
Faisal: Yes, exactly. It makes it more susceptible to noise.
SE: Noise becomes very interesting here because it may depend on what’s in the vicinity of a particular block or chiplet. Tolerances become tighter if you drop the voltage too much, right?
Thiagarajan: Noise has two different angles. There is predictable noise, and there’s just random noise. Random noise is something you design for. If you design a VCO (voltage-controlled oscillator), it’s based on the devices that are part of it. So the VCO would contribute to random jitter. The predictable or deterministic aspect is when you know you’re going to get this noise because of something else you have. The classic example is that we’re tying all these IP blocks to the same power supply on this one domain, but then it gets tied to other IPs at the package level. There are all these noise signatures for all these IPs banging away, but you may not even have considered them in the design of the first set of IPs. So now you see new frequency noise signatures at the system level, and that ends up affecting your other architectures that did not consider it. And so you’ve got this unintended coupling, because of this deterministic noise due to supply sharing at the die level, at the package level, and the board level, and that’s where you need filtering.
SE: And with advanced packaging, you also have substrate noise that you typically never considered in the past?
Pujol: Yes, and that chip will be even worse with substrate noise. I was at an automotive company and they were detecting a weird thing on the cable for their braking system. It was Bluetooth. Of course that’s bad, because it’s a car. But if you put that in a chiplet, it’s the same thing. In a base station, you never know how much power is in the adjacent channel. You need to be to be very agile in the way you design your circuits so that you’re immune to the noise, and that’s getting difficult with the low margins we now have. This can affect all the things in the channel, too. If memory effects are coming from analog signals and analog biasing, if you look at the transmission, most of the problems are coming from the analog bias. That goes back to the analog problem. If it’s an RF transmission, of course there’s a transmission that will give you attenuation, but most of the effects are coming from analog biasing. So we need to be sure that we design that well, because everything will be connected to this analog block, whatever the channel. If it’s a car, chiplet, board, whatever, this is the limiting factor in the transmission most of the time. And because we have well-defined aging in the analog block, we could predict it for RF, as well, because this is what degrades.
Elhak: Going back to AI, there are situations where this process can be automated without taking the fun out of analog, like optimizing the design from one version of the device model to another, or moving the design from one node to another. So a big GPU maker may move from one version to another because there are all kinds of digital bells and whistles in the new GPU. But they would probably have the same PCIe interface in both, and they still have to redesign it from scratch in a new node because it has to be on the same chip as the new GPU. These are use cases where we can automate analog design. The intelligent part is still the analog designers, but the repetitive part is now done by the machine.
Faisal: The analog design is closer to what AI does than the digital design, which is very sequential and linear. The analog designer has to think about 20 different parameters and tradeoffs. That’s why experience is so important. AI is actually more relevant and useful to solve analog problems. It’s still linear algebra, but there are a lot of parameters to keep track of.