Chi phí lưu trữ và xử lý dữ liệu EDA hiệu quả

Engineering teams are turning to the cloud to process and store increasing amounts of EDA data, but while the compute resources in hyperscale data centers are virtually unlimited, the move can add costs, slow access to data, and raise new concerns about sustainability.

For complex chip designs, the elasticity of the cloud is a huge bonus. With advanced-node chips and packaging, the amount of data that needs to be processed is exploding. Today, roughly 45% of all advanced-node chips are being designed by large systems companies with their own cloud operations, which at least for now provides them with a time-to-market advantage. But as more chipmakers push down in the angstrom range, and as heterogeneous integration goes mainstream, more chip/chiplet companies will begin leveraging design tools in the cloud because it eliminates the up-front investment in expensive tools and data centers.

But which data gets processed in the cloud and where it gets stored will depend on how much chipmakers trust the security of the cloud, and what exactly is being sent there to be processed. In all likelihood, many chipmakers will use a hybrid scheme, keeping the most sensitive data on-premise and leveraging the cloud wherever massive compute power is required. This is far from the most efficient approach, however.

“As AI gets increasingly incorporated into the design flow, the amount of data needed to train models will continue to grow,” said Rob Knoth, group director for strategy and new ventures at Cadence. “We are operating on a million-to-billion scale of sensors, cell phones, cameras, and cars. There are 6G wireless transceivers and repeaters. This is where a more intelligent power optimization flow in the semiconductor design process can have giant ramifications. Semiconductors becoming more pervasive in our society allows intelligence to be put into more of these applications. It makes all our lives easier. But you can’t swing a dead cat without hitting an article about AI data centers consuming too much power. They’re saying, ‘The world’s going to get sucked into a black hole of training power efficiency.’ Semiconductors are becoming more pervasive, and that means that the orders of magnitude start getting multiplied here.”

Power consumption in large data centers has increased 16% (CAGR) since 2016, and a number of reports predict that number will accelerate rapidly with the rollout of large language models. Even where there is sufficient electricity being generated, power distribution grids are struggling to keep pace. “If you look at the ramifications of what’s happening with AI and hyperscalers missing their sustainability targets because of their investments, this needs to be talked about,” said Knoth.

Bottom line vs. sustainability
Sustainability is a big buzzword in the tech world, and one of the chief concerns while developing EDA software is that it’s doing no harm in terms of compute power.

“I see this as we develop our own software,” Knoth said. “It takes so much compute power to regress the software to make sure that if this one test case improved by 10%, how did these other 100 test cases turn out? You have to make sure that the software is getting more effective, but it’s also doing no harm, because it’s now optimizing and creating so many different types of semiconductors.”

Dean Drako, co-founder and CEO of IC Manage, contends that in the grand scheme of things, the amount of electricity it takes to fuel the EDA process is a drop in the bucket when compared to where the chips ultimately end up. Any efforts to reduce compute are about the bottom line, not about any desire to achieve more sustainable practices.

“Who does big chip design?” Drako said. “You’ve got to look at each one of them as potentially motivated differently. They all want to make money. They all have to design chips. The amount of power they use to design the chips is probably 0.01% of the power of the chips that they actually manufacture, so does it really matter how much power you use to design the chip? It doesn’t seem like it.”

Jim Schultz, product marketing manager at Synopsys, observes there is a sustainability issue at play during the EDA process, but it’s not one that has much to do with the environment. Rather, it’s about preserving resources in order to speed up run times, particularly when farming out the compute to different machines can hit a point of limited return. “There you can say, ‘Hey, there’s a sustainability issue. You don’t want to waste the resources and the compute resources to get maybe 5% faster run time. It’s not worth it.”

Conveniently, efforts to reduce carbon footprint and to cut costs boil down to essentially the same thing. “The good thing is, they’re very much aligned,” said Bill Mullen, a fellow at Ansys. “Using more power and more computers is wasteful, and it costs you in the bottom line. Everybody’s incentivized to be more efficient at what they do. And this has a long history in EDA. All of our customers want us to optimize our tools to make them more efficient.”

Local data vs. the cloud
Even if sustainability is not necessarily the primary goal, maintaining a more efficient level of compute is. It’s a finite internal resource for most companies, and one that requires forethought when being portioned out for a project. The EDA process is generating huge amounts of data, and that data needs to be stored somewhere. But whether that data gets analyzed and stored on-premises or in the cloud is less about running out of disk space and more about a number of other concerns.

“When you have an internal data center, the constraints are pretty hard,” said Mullen. “If you want to add resources, usually it takes months to order the systems, get them installed with everything involved, so you’ve got a hard limit. If a company has to tape out by a certain date, they’re going to be stuck using those resources on the cloud. The limit isn’t so much the number of physical machines or the amount of storage available. It’s cost. You have a certain OpEx budget that’s available, and you can’t just spend infinite amounts of money to bring that in. Everybody’s constrained in one way or the other. You just have to look at what your needs are and make the right tradeoffs.”

One of the plus side, the cloud enables things like burst computing, where you need additional resources for a short period of time. “It also offers more resources than you could possibly get from certain types on-premises. You might need a lot of GPUs, which are impractical to get in your own data center, and they may be available in the cloud, for example. There’s many factors that go into this,” Mullen noted.

Tim Thornton, director of Arm-based engineering, said that by using spot resources, rather than on-demand, costs can be managed more effectively. “The benefits of flexibility in scale and access to new compute hardware are worth having. While not all workloads map well to spot, technologies from vendors such as Exostellar provide persistence in spot instances that can address that use case.”

On the other hand, speed is often a key consideration for any project. Mullen said that geography can play as big a role in deciding whether to turn to the cloud. “You have to consider how frequently you run that workflow. Do you have to move data from one location to another? If you have everything located in one site where your implementation and analysis is together, that’s usually more optimal than having to send terabytes of data across the country or something like that.”

To Drako, the decision about whether to use cloud-based computing versus on-prem is just the latest iteration of a problem that dates back 50 years, namely using today’s computers to design the more complicated ones to follow. That loop requires constant replacement of the hardware on-prem, a cost that can be avoided by turning to the cloud. But doing so means upping your costs in the short term, a cost that many chipmakers are opting out of.

“But then you also have this headache of, are we good at running data centers?” Drako said. “Do we have the right people? Do we want to do it or not? What do we want to get out of it? There are all kinds of issues with it, don’t get me wrong. You’ve got to manage operating systems. You’ve got to do upgrades. You have hardware failures. You have to have people physically doing stuff. But in the end, it boils down to cost and speed.”

Schultz proposed a solution that he believes offers the best of both worlds, avoiding the pain of sending terabytes of data back and forth, which can cost run time, while still allowing a designer to reap the benefits of working from the cloud.

“II were going to do an implementation, I want to put my initial data in the cloud, and I want to run the whole thing in cloud,” Schultz explained. “I don’t want to do parts of it on prem and then parts in the cloud, because if I were to just do the actual implementation and I had to transfer terabytes of data, you’re going to wait for all of that data to get transferred. Oftentimes what we will do is make use of the local disk on the machine, not even wanting to store key data in the internal network. So, if I get a a very large machine, I make use of its temporary disk space as memory storage. It will be much faster than going through the internet to deal with it.”

While the data centers behind AI models are consuming vast amounts of compute, Mullen believes the power of machine learning ultimately can help lower the compute demands of EDA, and that there’s an initial cost of training, but much more than that can be gained back.

“Some examples are that you use the ML to be more optimal in how you partition a problem, or how your algorithm works, or you trained it on certain designs,” he noted. ”We have a thermal artist capability. Instead of calculating in great detail using traditional approaches how temperature is dissipated within a 3D-IC, we can do it much more efficiently with an ML model. If you have a model that’s generalizable, that can be very efficient.”

That sentiment was echoed by Thornton, who believes AI tools actually can be used to reduce the amount of compute needed. “As an example, by learning which tests exercise certain parts of a core, optimizations can be made on regression suites to achieve the required level of coverage while reducing the number of individual tests required to achieve it,” he said.

To Schultz, the key to improve compute efficiency lies in proper data analytics, which he says can aid both individual engineers and management. “You would want to get data analytics products in your flow since as the number of chips starts rising rapidly, the number of experienced engineers is not. You don’t have as many experienced engineers for as many chips as are being started out there, so debug by an inexperienced engineer takes longer. If you want to be more efficient, using data analytics can help the engineer quickly spot things that they can improve on. It also can help the organization become more efficient with disk management, because they can put a policy in place for data retention.”

Conclusion
The need for EDA to become more efficient is being pushed by the speed of projects and budget limitations. Both on-premise and cloud storage have their pros and cons. Choosing which to use for a project can be a difficult decision that requires considering the cost and speed requirements.

For some in the industry, there also is the sustainability concern. Power grids will soon be pushed to their limits by data centers. And while the EDA process requires a mere fraction of the compute and electricity needed to power an AI model, some see in that an ecological and social responsibility, but also an opportunity to push for more innovative solutions. But this is set against the high cost of entry, be it an on-premises data center or purchasing more cloud space.

“It’s easy to jump onto a bandwagon or a hype cycle and get scared about the cost to sit at the table,” said Cadence’s Knoth. “But true and good engineering and science has never shirked away from that fear. It’s about really grappling with it, and doing good work. Then we can have a major positive impact.”

Related Reading
IC Industry’s Growing Role In Sustainability
Addressing energy consumption has become a requirement as AI takes root, but it requires changes across the entire ecosystem.
IC Manufacturing Targets Less Water, Less Waste
New technologies and processes help companies strive for net-zero.

source

Facebook Comments Box

Trả lời

Email của bạn sẽ không được hiển thị công khai. Các trường bắt buộc được đánh dấu *