谷歌董事会主席John Hennessy:AI技术持续发展放缓,我们正处于半导体产业寒冬 | 钛媒体T-EDGE

浦江娱乐新闻网 2025-08-19

deep learning for several small examples but certainly AlphaGo defeating the world’s go champion at least ten years before it was expected was a dramatic breakthrough. It relied on deep learning technologies, and it exhibited what even professional go players would say was creative play.

在广度求学上我们意味着了灾难性创出。最出名的举例来却说应该就是 AlphaGo 打败了围棋全球冠军,这个更进一步要比预期以前了大概十年。Alpha Go运用于的就是广度求学新科技,甚至连从业者全人类国际象棋也夸赞Alpha Go的棋艺颇具文创。

That was the beginning of a world change.

这是巨变的开端。

Today we've seen many other deep learning breakthroughs where deep learning is being used for complex problems, obviously crucial for image recognition which enables self-driving cars, becoming more and more useful in medical diagnosis, for example, looking at images of skin to tell whether or not a lesion is cancerous or not, and applications in natural language particularly around machine translation.

现今,广度求学也在其他层面赢取灾难性创出,被应主要用途解决有用的情况。其中则会最明祚的人为是图像识别新科技,它让自动驾驶新科技成为意味著。图像识别新科技在外科诊断中则会也越发愈发有用,可通过检视皮肤上图像判断是不是存在上皮细胞。除此之外,还有在人为词汇检视中则会的系统设计,尤其是在机器翻译多方面颇具更进一步。

Now for Latin-based language basically being as good as professional translators and improving constantly for Chinese to English, a much more challenging translation problem but we are seeing even a significant progress.

迄今,拉丁语系的机器翻译以前能继续做到和从业者翻译职员相似的高密度。在更加具再一的汉英翻译多方面上,机器翻译也有不停修改,我们之前能注意到祚着的进步。

Most recently we've seen AlphaFold 2, a deep minds approach to using deep learning for protein folding, which advanced the field by at least a decade in terms of what is doable in terms of applying this technology to biology and going to dramatically change the way we make new drug discovery in the future.

未来会我们也有 AlphaFold 2,一种运用于广度求学展开氨基酸结构数据分析的系统设计,它将广度求学与遗传学展开融合,让该多种类型的系统设计进步了大概十年,将并不大程度地扭曲药物共同完毕成开发的方式将。

What drove this incredible breakthrough in deep learning? Clearly the technology concepts have been around for a while and in fact many cases have been discarded earlier.

是什么让广度求学赢取了以上创出?祚然,这些新科技内涵之前存在一直了,在或多或少上也曾被无法忍受过。

So why was it able to make this breakthrough now?

那么为什么依然我们都能赢取创出呢?

First of all, we had massive amounts of data for training. The Internet is a treasure trove of data that can be used for training. ImageNet was a critical tool for training image recognition. Today, close to 100,000 objects are on ImageNet and more than 1000 images per object, enough to train image recognition systems really well. So that was the key.

首先是我们有了大量的数据集主要用途体能训练AI。互联网是数据集的宝库。例如 ImageNet ,就是体能训练图像识别的不可忽视工具。依然ImageNet 上有近 100,000 种水滴的图像,每种水滴有多达 1000 张图像,这足以让我们极佳地体能训练图像识别系统会。这是不可忽视发生变化之一。

Obviously we have lots of other data were using here for whether it's protein folding or medical diagnosis or natural language we're relying on the data that's available on the Internet that's been accurately labeled to be used for training.

我们当然也运用于了其他大量的数据集,无论是氨基酸结构、外科诊断还是人为词汇检视多方面,我们都依赖互联网上的数据集。当然,这些数据集须要被准确上面才能主要用途体能训练。

Second, we were able to marshal mass of computational resources primarily through large data centers and cloud-based computing. Training takes hours and hours using thousands of specialized processors. We simply didn't have this capability earlier. So that was crucial to solving the training problem.

第二,大型数据集中则会心和云数值给我们致使大量的幂资源。运用于数千个专用Intel展开人工智慧体能训练只须要数小时就能完毕成,我们前根本不会这种能力。因此,算力也是一个不可忽视状况。

I want to emphasize that training is the computational intensive problem here. Inferences are much simpler by comparison and here you see the rate of growth of performance demand in petaflops days needed to train a series of models here. If you look at training AlphaZero for example requires 1000 petaflops days, roughly a week on the largest computers available in the world.

我只想强调的是,人工智慧体能训练造成的情况是密集的算力供给,插件逻辑推理越发古怪得多。这里简介的是体能训练人工智慧框架的稳定性供给持续增长率。以体能训练 AlphaZero 为例,它须要 1000 pfs-day,也就是却说用全球上较大规模的数值机来体能训练要用上一周。

This speed has been growing actually faster than Moore's law. So the demand is going up faster than what semiconductors ever produced even in the very best era. We've seen 300,000 times increase in compute from training simple models like AlexNet up to AlphaGo Zero and new models like GPT-3 had billions of parameters that need to be set. So the training in the amount of data they have to look at is truly massive. And that's where the real challenge comes.

这个持续增长率仅仅仅仅比更加进一步还要较慢。因此,即使在电子元件零售业最鼎盛的时代,供给的持续稳定增长也比电子元件生产的要较慢。从体能训练 AlexNet 这样的古怪框架到 AlphaGo Zero,以及 GPT-3 等新框架,有数十亿个参数须要展开原作,算力之前减低了 300,000 倍。这里涉及到的数据集量是到底并不浩大,也是我们须要面对的再一。

Moore's law, the version that Gordon Moore gave in 1975, predicted that semiconductor density would continue to grow quickly and basically double every two years but we began to diverge from that. Really quickly diverge began in around 2000 and then the spread is growing even wider. As Gordon said in the 50th anniversary of the first prediction: no exponential is forever. Moore's law is not a theorem or something that's definitely must hold true. It's an ambition which the industry was able to focus on and keeping tag. If you look at this curve, you'll notice that for roughly 50 years we drop only a factor of 15 while gaining a factor of more than almost 10,000.

更加进一步,即戈甫米勒在 1975 年给出的正式版,数据分析电子元件高密度将继续短时间持续增长,以前每两年翻一番,但我们开始偏差这一持续稳定增长。偏差在2000 年左右显现出来,并逐步扩大。戈甫在数据分析后的五十年后曾却谈到:不会任何的天体物理学本质可以持续时间推移扭曲。当然,更加进一步不是猜想或必须成立的启示,它是电子元件零售业的一个目标。仔细观察这条切线,你则会注意到在差不多 50 年中则会,我们仅仅偏差了约 15 倍,但总共持续增长了近 10,000 倍。

So we've largely been able to keep on this curve but we began diverging and when you factor in increasing cost of new fab and new technologies and you see this curve when it's converted to price per transistor not dropping nearly as fast as it once fell.

所以我们以前都能保有在这条切线上,但我们可能开始跟不上了。如果你考虑到新DRAM和新科技的发展的生产成本减低,当它叠加为每个二极体的价格时,你则会注意到这条切线的下降速度不像此前下降的那么较慢。

We also have faced another problem, which is the end of so-called dennard scaling. Dennard scaling is an observation led by Robert Dennard, the inventor of DRAM that is ubiquitous in computing technology. He observes that as dimensions shrunk so would the voltage and other assonance for example. And that would result in nearly constant power per millimeter of silicon. That meant because of the amount of transistors that were in each millimeter we're going up dramatically from one generation to the next, that power per computation was actually dropping quite quickly. That really came to a halt around 2007 and you see this red curb which was going up slowly at the beginning between 2000 and 2007 really began to take off. That meant that power was really the key issue and figuring out how to get energy efficiency would become more and more important as these technologies went forward.

我们还遭遇另一个情况,即都是的甫嘉宝可视相对论。甫嘉宝可视相对论是由罗伯特·甫嘉宝 领导的一项观察科学知识实验,他是DRAM的发明人。据他的观察,随着尺寸扩大,电压和其他共振也则会扩大,这将导致每毫米硅的牵引力基本上恒定。这这样一来由于每一毫米中则会的二极体为数从代人到下代人急剧减低,每个数值的牵引力仅仅仅仅下降得并不较慢。这在 2007 年左右最为明祚,在 2000 年到 2007 后期开始缓慢上升的耗电量开始激增。这这样一来耗电量可能是关键状况情况,随着这些新科技的其发展,弄清楚如何赢得更加高的可持续兼职效率将越发愈发不可忽视。

Combine results of this is that we've seen a leveling off of unit processor performance, single core performance, after going through a rapid growth in the early period of the industry of roughly 25% a year and then a remarkable period with the introduction of RISC technologies, instructional-level parallelism, of over 50% a year and then a slower period which focused very much on multicore and building on these technologies.

在境况了零售业以前期每年差不多 25% 的持续增长最后,随着 RISC 新科技的引入和呼叫级借助于新科技的显现出来,开始有每年多达 50% 的稳定性持续增长。最后我们就诞生了多核反应时代,专注于在现有新科技上展开深耕。

In the last two years, only less than 5% improvement in performance per year. Even if you were to look at multicore designs with the inefficiencies that come about you see that that doesn't significantly improve things across this.

在依然的两年中则会,每年的稳定性改善将近 5%,即使多核反应结构设计也不会祚着优化能效多方面的情况。

And indeed we are in the we are in the era of dark silicon where multicore often slow down or shut off a core to prevent overheating and that overheating comes from power consumption.

事实上,我们已是电子元件寒冬。多核反应Intel还是则会因为担心机件而约束自身的稳定性。而机件的情况就来自耗电量。

So what are we going to do? We're in this dilemma here where we've got a new technology deep learning which seems able to do problems that we never thought we could do quite effectively. But it requires massive amounts of computing power to go forward and at the same time Moore's law on the end of Dennard Scaling is creating a squeeze on the ability of the industry to do what it relies on for many years, namely just get the next generation of semiconductor technology everything gets faster.

那么我们能继续做什么呢?我们在这里陷入了两难境地,我们拥有一项新科技的发展,广度求学,它似乎都能高效地解决很多情况,但同时它须要大量的算力才能进步。同时,一边我们有着甫嘉宝可视相对论,一边有着更加进一步,我们再也不能期待电子元件新科技的更加新插值能给我们造成飞跃的稳定性持续增长。

So we have to think about a new solution. There are three possible directions to go.

因此,我们必须考虑新的解决提案。这里有三个意味著的路径。

Software centric mechanisms where we look at improving the efficiency of our software so it makes more efficient use of the hardware, in particular the move to scripting languages such as python for example better dynamically-typed. They make programming very easy but they're not terribly efficient as you will see in just a second.

以系统设计程序为中则会心的组态。我们着眼于提升系统设计程序的兼职效率,以便更加高兼职效率依靠嵌入式,特别是Java,例如 python。这些词汇让编程越发并不古怪,但它们的兼职效率并不高,接下来我则会详实解释。

Hardware centric approaches. Can we change the way we think about the architecture of these machines to make them much more efficient? This approach is called domain specific architectures or domain specific accelerator. The idea is to just do a few tasks but to tune the hardware to do those tasks extremely well. We've already seen examples of this in graphics for example or modem that's inside your cell phone. Those are special purpose architectures that use intensive computational techniques but are not general purpose. They are not programmed for arbitrary things. They are not designed to do a range of graphics operations or the operation is required by modem.

以嵌入式为中则会心的方法。我们能否扭曲我们对嵌入式Core的结构设计,使它们更加加高效?这种方法特指特定层面Core或特定层面实验室。这里的结构设计思路是让嵌入式继续做特定的目标,然后冗余要并不好。我们之前在三维检视或手机内的因特网中则会注意到了这样的举例来却说。这些运用于的是密集数值新科技,不是主要用途常用幂的,这也这样一来它们不是结构设计来继续做各种各样的幂,它们借以展开三维操作的安排或因特网须要的幂。

And then of course some combinations of these. Can we come up with languages which match to these new domain specific architecture? Domain specific languages which improve the efficiency and let us code a range of applications very effectively.

终于是以上两类的一些融合。我们是不是能合作开发出与这些特定Core相叠加的词汇?特定层面词汇可以提升兼职效率,让我们并不高兼职效率合作开发系统设计插件。

This is a fascinating slide from a paper that was done by Charles Leiserson and his colleagues at MIT and publish on Science called There's plenty of room at the Top.

这是查理·雷瑟森和他在麻省理工学院的同事完毕成发表文章在《科学知识》新闻周刊上的一篇论文内容。论文名为“顶端有所需的维度”。

What they want to do observe is that software efficiency and the inefficiency of matching software to hardware means that we have lots of opportunity to improve performance. They took admittedly a very simple program, matrix multiply, written initially in python and ran it on an 18 core Intel processor. And simply by rewriting the code from python to C they got a factor of 47 in improvement. Then introducing parallel loops gave them another factor of approximately eight.

他们只想观察的是系统设计程序兼职效率,以及系统设计程序与嵌入式叠加流程中则会造成的偏高兼职效率,这也这样一来我们有很多提升兼职效率的地方。他们在 18 核反应NVIDIAIntel上直通了一个用 Python 编写的古怪插件。把预定义从 Python 重写为 C词汇最后,他们就只想得到了 47 倍的兼职效率修改。引入借助于周而复始后,又有了差不多 8 倍的修改。

Then introducing memory optimizations if you're familiar with large scale metrics multiplied by doing it in blocked fashion you can dramatically improve the ability to use the cashe as effectively and thereby they got another factor a little under 20 from that about 15. And then finally using the vector instructions inside the Intel processor they were able to gain another factor of 10. Overall this final program runs more than 62,000 times faster than the initial python program.

引入闪存冗余后可以祚着提升调用的运用于兼职效率,然后就又能赢得15~20倍的兼职效率提升。然后终于运用于NVIDIAIntel实际上的向量呼叫,又都能赢得10 倍的修改。总体而言,这个最终插件的直通速度比本来的 Python 插件较慢62,000 多倍。

Now this is not to say that you would get this for the larger scale programs or all kinds of environments but it's an example of how much inefficiency is in at least for one simple application. Of course not many performance sensitive things are written in Python but even the improvement from C to the fully parallel version of C that uses SIMD instructions is similar to what you would get if you use the domain specific processor. It is significant just in its onw right. That's nearly a factor of 100, more than 100, its almost 150.

当然,这并不是却说在更加大规模的插件或所有周边环境下我们都可以赢取这样的改善,但它是一个极佳的举例来却说,大概能却解释一个古怪的系统设计插件也有用修改维度。当然,不会多少稳定性敏感的插件是用 Python 写的。但从完毕全借助于、运用于SIMD 呼叫的C词汇正式版插件,它能赢得的兼职效率改善类似于特定层面Intel。这之前是很大的稳定性改善了,这基本上是 100 的自然数,多达 100,基本上是 150。

So there's lots of opportunities here and that's the key point behind us slide of an observation.

所以改善维度是很多的,这个研究的发现就是如此。

So what are these domain specific architecture? Their architecture is to achieve higher efficiency by telling the architecture the characteristics of the domain.

那么特定层面Core是什么呢?这些Core能让Core把握特定层面的形态来意味着更加高的兼职效率。

We're not trying to do just one application but we're trying to do a domain of applications like deep learning for example like computer graphics like virtual reality applications. So it's different from a strict ASIC that is designed to only one function like a modem for example.

我们在继续做的不只是一个系统设计插件,而是在尝试继续做一个系统设计插件层面,比如广度求学,例如像虚拟全球、三维检视。因此,它不同于ASIC,后者结构设计仅仅不具备一个功用,就例如因特网。

It requires more domain specific knowledge. So we need to have a language which conveys important properties of the application that are hard to deduce if we start with a low level language like C. This is a product of codesign. We design the applications and the domain specific processor together and that's critical to get these to to work together.

它须要更加多特定层面的知识。所以我们须要一种词汇来传达系统设计插件的不可忽视属性,如果我们从像 C 这样的词汇开始就很难可知这些属性。这是相互配合结构设计的产物。我们一齐结构设计系统设计插件和特定层面的Intel,这对于让它们相互配合兼职至关不可忽视。

Notice that these are not going to be things on which we run general purpose applications. It's not the intention that we take 100 C code. It’s the intention that we take an application design to be run on that particular DSA and we use a domain specific language to convey the information to the application to the processor that it needs to get significant performance improvements.

请注意,这不是用来直通常用系统设计程序的。我们的目的不是要都能直通100 个 C 词汇插件。我们的目的是让系统设计插件结构设计在特定的 DSA 上直通,我们运用于特定层面的词汇将系统设计插件的信息传达给Intel,从而赢得祚着的稳定性改善。

The key goal here is to achieve higher efficiency both in the use of power and transistors. Remember those are two limiters the rate at which transistor growth is going forward and the issue of power from the lack of Denard scaling. So we're trying to really improve the efficiency of that.

这里的关键状况目标是在牵引力和二极体多方面意味着更加高的兼职效率。请只想到,二极体持续增长的速度和甫嘉宝可视相对论是两个约束状况,所以我们正要努力提升兼职效率。

Good news? The good news here is that deep learning is a broadly applicable technology. It's the new programming model, programming with data rather than writing massive amounts of highly specialized code. Use data to train deep learning model to detect that kind of specialized circumstance in the data.

有什么但他却吗?但他却是广度求学是一种广泛适用的新科技。这是一种新的编程框架,运用于数据集展开编程,而不是编写大量移动性国际化的预定义,而是运用于数据集体能训练广度求学框架来发现数据集中则会的特殊性情况。

And so we have a good target domain here. We have applications which are really demanding of massive amounts of performance increase through which we think there are appropriate domain specific architectures.

所以我们有一个极佳的目标域,我们有一些真正须要大量稳定性改善的系统设计插件,因此我们并不认为是有最合适的特定层面Core的。

It's important to understand why these domain specific architectures can win in particular there's no magic here.

我们须要弄明白这些特定层面Core的优势。

People who are familiar with the books Dave Patterson and I co-authored together know that we believe in quantitative analysis in an engineering scientific approach to designing computers. So what makes these domain specific architectures more efficient?

熟悉大卫·帕特森和我合著的书籍的人都知道,在数值机结构设计上,我们基督教徒遵循航空航天方法论的量化分析。那么是什么让这些特定层面Core更加高效呢?

First of all, they use a simple model for parallelism that works in a specific domain and that means they can have less control hardware. So for example we switch from multiple instruction multiple data models in a multicore to a single instruction data model. That means we dramatically improve the energy associated with fetching instructions because now we have to fetch one instruction rather than any instructions.

首先,他们运用于一个古怪的借助于框架,在特定层面兼职,这这样一来它们可以拥有更加少的操纵嵌入式。例如,我们从多核反应中则会的多呼叫多框架切换到单呼叫框架。这这样一来我们祚着提升了与获取呼叫无关的兼职效率,因为依然我们必须获取一条呼叫而不是任何呼叫。

We move to VLIW versus speculative out of order mechanisms, so things that rely on being able to analyze the code better know about dependences and therefore be able to create and structure parallelism at compile time rather than having to do with dynamically runtime.

我们来看看VLIW和推测性乱序组态的对比。依然须要更加容易检视预定义的也都能获知其依附性,因此都能在编译时创建和协作借助于性,而不须展开建模直通。

Second we make more effective use of memory bandwidth. We go to user controlled memory system rather than caches. Caches are great except when you have large amounts of data does streaming through them. They're extremely inefficient that's not what they meant to do. They are meant to work when the program does repetitive things but it is somewhat in predictable fashion. Here we have repetitive things in a very predictable fashion but very large amounts of data.

其次,我们更加高兼职效率依靠闪存信道。我们运用于用户操纵的闪存系统会而不是调用。调用是好东西,但是如果要检视大量数据集的话就不则会那么好使了,兼职效率极偏高,调用不是用来干这事的。调用借以在插件分派不具备比方却说、可数据分析的操作时起到作用。这里分派的幂虽然比方却说高且可数据分析,但是数据集量是在太大。

So we go to an alternative using prefetching and other techniques to move data into the memory once we get it into the memory within the processor within the domain specific processor. We can then make heavy use of the data before moving it back to the main memory.

那我们就用个别的方式将。在我们把数据集导入特定层面Intel上的闪存最后,我们采用预提取和其他新科技手段将数据集导入闪存中则会。接着,在我们须要把数据集导去主存前,我们就可以重度运用于这些数据集。

We eliminate unneeded accuracy. Turns out we need relatively much less accuracy then we do for general purpose computing here. In the case of integer, we need 8-16 bit integers. In the case of floating point, we need 16 to 32 bit not 64-bit large-scale floating point numbers. So we get efficiency thereby making data items smaller and by making the arithmetic operations more efficient.

我们补救了不须要的准确性。确信,我们须要的精准度比主要用途常用数值的精准度要偏高得多。我们只须要8-16位整数,要16到32位而不是64位的大规模二进制。因此,我们通过使数据集项越发更加小而提升兼职效率。

The key is that the domain specific programming model matches the application to the processor. These are not general purpose processor. You are not gonna take a piece of C code and throw it on one of these processors and be happy with the results. They're designed to match a particular class of applications and that structure is determined by that interface in the domain specific language and the underlining architecture.

关键状况在于特定层面的编程框架将系统设计插件与Intel叠加。这些不是常用Intel。你不则会把一段 C 预定义扔到其中则会一个Intel上,然后对结果感到恼火。它们借以叠加特定类别的系统设计插件,并且该结构由层面特定词汇中则会的API和Core不得不。

So this just shows you an example so you get an idea of how were using silicon rather differently in these environments then we would in a traditional processor.

这里我们来看一个举例来却说,以便了解这些Intel与常规Intel的除此以外。

What I've done here is taken a first generation TPU-1 the first tensor processing unit from Google but I could take the second or third or fourth the numbers would be very similar. I show you what it looks like it's a block diagram in terms of what the chip area devoted to. There's a very large matrix multiply unit that can do a two 56 x 2 56 x 8 bit multiplies and the later ones actually have floating point versions of that multiplying. It has a unified buffer used for local activations of memory buffer, interfaces accumulators, a little bit of controls and interfaces to DRAM.

这里简介是谷歌的第代人 TPU-1 ,当然我也可以采用第二、第三或第四代,但是它们造成的结果是并不相似的。这些看起来像边框一样的图就是微检视器各地区的组织机构。它有一个并不大的矩阵幂单元,可以分派两个 56 x 2 56 x 8 位幂,后者实不具备浮点正式版幂。它有一个统一的缓冲区,主要用途本地闪存激活。还有API、累加器、DRAM。

Today that would be high bandwidth DRAMs early on it with DDR3. So if you look at the way in which the area is used. 44% of is used for memory to store temporary results in weights and things been computed. Almost 40% of being used for compute, 15% for the interfaces and 2% for control.

在现今我们运用于的是高信道DRAM,以前意味著用的是DDR3。那我们来具体看看这些地区的组织机构。 44% 主要用途闪存以短时间闪存储幂结果。 40% 主要用途数值,15% 主要用途API,2% 主要用途控件。

Compare that to a single Skylake core from an Intel processor. In that case, 33% as being used for cach. So noticed that we have more memory capacity in the TPU then we have on the Skylake core. In fact if you were to remove the caps from the cache that number because that's overhead it's not real data, that number would even be larger. The amount on the Skylake core will probably drop to about 30% also almost 50% more being used for active data.

将其与NVIDIA的 SkylakeCore展开尤其。在这种情况下,33% 主要用途调用。请注意,我们在 TPU 中则会拥有比在Skylake 实际上上更加多的闪存容量,事实上,如果移除调用约束,这个二进制甚至则会更加大。 Skylake 实际上上的为数意味著则会下降到差不多 30%,主要用途活动数据集的为数也则会减低近 50%。

30% of the area is used for control. That's because the Skylake core is an out of order dynamic schedule processor like most modern general purpose processors and that requires significantly more area for the control, roughly 15 times more area for control. That control is overhead. It’s energy intensive computation unfortunately the control unit. So it's also a big power consumer. 21% for compute.

30% 的地区主要用途操纵。这是因为与大多数现代常用Intel一样,Skylake 实际上是一个无序的建模调度Intel,须要更加多的操纵地区,差不多是15 倍的地区。这种操纵是额外税金。碰巧的是,操纵单元是可持续密集型数值,所以它也是一个能量消耗大户。 21% 主要用途数值。

So noticed that the big advantage that exists here is the compute areas roughly almost double what it is in a Skylake core. Memory management there's memory management overhead and finally miscellaneous overhead. so the Skylake core is using a lot more for control a lot less for compute and somewhat less for memory.

这里存在的较大优势是数值地区基本上是 Skylake 实际上的两倍。闪存管理机构有闪存管理机构税金,终于是杂项税金。因此,操纵占据了Skylake 实际上的地区,这样一来主要用途数值的地区更加少了,闪存也是同理。

So where does this bring us? We've come to an interesting time in the computing industry and I just want to conclude by reflecting on this and how saying something about how things are likely to go forward in the future because I think we're at a real turning point at this point in the history of computing.

那么我们依然处于一个什么一段距离呢?我们回到了数值零售业的一个古怪中期。我只想通过互动一些我的个人思考、以及对未来的一些展望就此结束这场讲演,因为我并不认为我们正处在数值层面文化史的一个动因。

From 1960s, the introduction of the first real commercial computers, to 1980 we had largely vertically integrated companies.

从 1960 上世纪第一台真正的商用数值机的显现出来到 1980 年,市面上的数值机该公司以前都是斜向统合的。

IBM Burroughs Honeywell be early spin outs out of the activity at the university of Pennsylvania that built ENIAC the first electronic computer.

IBM、宝来该公司、霍尼韦尔、以及其他参与了宾夕法尼亚大学装配的全球上第一台电子数值机 ENIAC 该公司都是斜向统合的该公司。

IBM is the perfect example of a vertically integrated company in that period. They did everything, they built around chips they built the round disc's in fact the West Coast operation of IBM here in California was originally open to do disc technology and the first Winchester discs were built on the West Coast.

IBM 是那个中期斜向统合该公司的极致当今。IBM总是无所不能,他们围绕着微检视器装配,他们装配了光盘。事实上,IBM 在加利福尼亚的内陆地区业务范围本来就是光盘新科技,而第一个埃克塞特光盘就是在内陆地区装配出来的。

They built their own processors. The 360, 370 series, etc. After that they build their own operating system they built their own compilers. They even built their own database estate. They built their networking software. In some cases, they even built application program but certainly the core of the system from the fundamental hardware up through the databases OS compilers were all built by IBM. And the driver here was technical concentration. IBM could put together the expertise across these wide set of things, assemble a world-class team and really optimize across the stack in a way that enabled their operating system to do things such as virtual memory long before other commercial activities can do that.

他们还协作了自己的Intel,有360、370续作等等。最后他们合作开发了自己的Linux、编程词汇。他们甚至建立了自己的数据集库、自己的网络系统设计程序。他们甚至合作开发了系统设计插件。可以赞同的是,从基础嵌入式到数据集库、Linux、编程词汇等系统会实际上都是由 IBM 自己协作的。而这里的动力力是新科技的集中则会。 IBM 可以将这些广泛层面的从业者知识统合在一齐、改组一个全球一流的团队、并从而冗余整个指针,使他们的Linux都能继续做到虚拟闪存这种事,这可要比在其他该公司要以前得多。

And then the world changed, really changed with the introduction of the personal computer. And the beginning of the micro processors takes off.

接着显现出来了灾难性发生变化——笔记型电脑的推出和微Intel的涌现。

Then we change from a vertically organized industry to a horizontally organized industry. We had silicon manufacturers. Intel for example doing processors along with AMD and initially several other companies Fairchild and Motorola. We had a company like TSMC arise through silicon foundry making silicon for others. Something that didn't exist in earlier but really in the late 80s and 90s really began to take off and that enabled other people to build chips for graphics or other other functions outside the processor.

接着这个零售业从斜向趋向为精准度两端的。我们有专长继续做电子元件的该公司,例如NVIDIA和 AMD ,本来还有其他几家该公司例如仙童电子元件和摩托罗拉。宏达电也通过OEM涌现。这些在以前期都是见将近的,但在 80 上世纪末和 90 上世纪开始逐渐起步,让我们都能继续做其它多种类型的Intel,例如三维Intel等。

But Intel didn't do everything. Intel did the processors and Microsoft then came along and did OS and compilers on top of that. And oracle companies like Oracle came along and build their applications databases and other applications on top of that. So they became very horizontally organized industry. The key drivers behind this, obviously the introduction of the personal computer.

但是NVIDIA并不会合伙该公司包揽所有业务范围。NVIDIA专继续做Intel,然后开发者显现出来了,开发者继续做Linux和编程词汇。甲骨文等该公司随之显现出来,并在此基础上协作他们的系统设计插件数据集库和其他系统设计插件。这个零售业就变成了一个两端其发展等零售业。这背后的关键状况动力状况,祚然是笔记型电脑的显现出来。

The rise of shrinkwrap software, something a lot of us did not for see coming but really became a crucial driver, which meant that the number of architecture that you could easily support had to be kept fairly small because the software company is doing a shrink wrap software did not want to have to port and and verify that their software work done lots of different architectures.

系统设计程序实体经销商等兴起也是我们很多人不会预料到的,但它可能成为了一个关键状况的动力状况,这这样一来必须要约束可反对的Core为数,因为系统设计程序该公司不只想因为Core为数或许而须要展开大量的复制和验证兼职。

And of course the rise in the dramatic growth of the general purpose microprocessor. This is the period in which microprocessor replaced all other technologies, including the largest super computer. And I think it happened much faster than we expected by the mid 80s microprocessor put a series dent in the mini computer business and it was struggling by the by the early 90s in the main from business and by the mid 90s to 2000s really taking a bite out of the super computer industry. So even the supercomputer industry converted from customize special architectures into an array of these general purpose microprocessor. They were just far too efficient in terms of cost and performance to be to be ignored.

当然还有常用微Intel的短时间持续增长。这是微Intel取代所有其他新科技的中期,包含较大的超级数值机。我并不认为它发生的速度比我们预期的要较慢得多,因为 80 上世纪中则会期,微Intel对小型机业务范围致使一续作影响。到 90 上世纪初主要业务范围陷入困境,而到 90 上世纪中则会期到 2000 上世纪,它可能抢走了超级数值机零售业的一些市场份额。因此,即使是超级数值机零售业,也从自定义的特殊性Core趋向为一续作的常用微Intel,它们在生产成本和稳定性多方面的兼职效率毕竟是太高了,不容忽视。

Now we're all of a sudden in a new area where the new era not because general purpose processor is that gonna go completely go away. They going to remain to be important but they'll be less centric to the drive to the edge to the ferry fastest most important applications with the domain specific processor will begin to play a key role. So rather than perhaps so much a horizontal we will see again a more vertical integration between the people who have the models for deep learning and machine learning systems the people who built the OS and compiler that enabled those to run efficiently train efficiently as well as be deployed in the field.

依然我们突然带入了一个新纪元。这并不这样一来常用Intel则会完毕全消失,它们基本上很不可忽视,但它们将不是动力零售业其发展的主力军,都能与系统设计程序短时间联动的特定层面Intel将则会逐渐起到灾难性作用。因此,我们接下来无论如何则会注意到一个更加斜向的零售业,则会注意到拥有广度求学和机器求学框架的合作开发者,与Linux和编程词汇的合作开发者彼此间更加斜向的统合,使他们的插件都能有效性直通、高兼职效率体能训练以及带入实际运用于。

Inference is a critical part is it mean when we deploy these in the field will probably have lots of very specialized processors that do one particular problem. The processor that sits in a camera for example that's a security camera that's going to have a very limited used. The key is going to be optimize for power and efficiency in that key use and cost of course. So we see a different kind of integration and Microsoft Google and Apple are all looking at this.

插件逻辑推理是一个关键状况大部分,这这样一来当我们展开地面部队时,意味著则会有很多并不从业者的Intel来检视一个特定的情况。例如,位于摄像头中则会的Intel用途就并不有限。当然,关键状况是冗余耗电量和生产成本。所以我们注意到了一种不同的统合提案。开发者、谷歌和草莓都在重视这个层面。

The Apple M1 is a perfect example if you look at the Apple M1, it's a processor designed by apple with a deep understanding of the applications that are likely to run on that processor. So they have a special purpose graphics processor they have a special purpose machine learning domain accelerator on there and then they have multiple cores, but even the cores are not completely homogeneous. Some are slow low power cores, and some are high speed high-performance higher power cores. So we see a completely different design approach with lots more codesign and vertical integration.

例如Apple M1,Apple M1 就是一个极致的举例来却说,它是由 草莓结构设计的Intel,对草莓电脑上意味著直通的插件有着更佳的冗余。他们有一个专用的三维Intel、专用的机器求学层面实验室、有多个实际上。即使是Intel实际上也不是完毕全都是的,有些是耗电量偏高的、尤其慢的实际上,有些是嵌入式高耗电量的实际上。我们注意到了一种完毕全不同的结构设计方法,有更加多的相互配合结构设计和斜向统合。

We're optimizing in a different way than we had in the past and I think this is going to slowly but surely change the entire computer industry, not the general purpose processor will go away and not the companies that make software that runs on multiple machines will completely go away but will have a whole new driver and the driver is created by the dramatic breakthroughs that we seen in deep learning and machine learning. I think this is going to make for a really interesting next 20 years.

我们正要以与依然不同的方式将展开冗余,这则会是一个缓慢的流程,但赞同则会扭曲整个数值机零售业。我不是却说常用Intel则会消失,也不是却说继续做多网络服务系统设计程序的该公司将消失。我只想却说的是,这个零售业则会有全新的动力力,由我们在广度求学和机器求学中则会注意到的巨大创出创造的动力力。我并不认为这将使未来 20 年越发并不古怪。

Thank you for your kind attention and I'd like to wish the 2021 T-EDGE conference a great success. Thank you.

终于,你耐心地听完毕我这次演讲。我也预祝 2021 年 T-EDGE 则会议赢取圆满成功,没关系。

(本文亮相钇报导App)

哪里的医院治疗白癜风专业
四平治疗白癜风医院
成都白癜风
科普视频
急支糖浆适合哪种咳嗽
一直咳嗽怎么办用什么方法止咳
腰椎管狭窄
太极急支糖浆适用于哪种咳嗽
相关阅读

凭借65人撑起“元宇宙”内涵的公司,你知道吗?

资讯 2025-08-19

商业性破解12同月13日(邵峰)消息,自从Roblox凭借元地球人内涵港交所后,这个内涵就被欧洲各国港交所企业其后所述,不少港交所子公司都广告宣传配置或者开始元地球人具体企业隐没。在普通股疯长的

B站杀向NFT,元地球人会是择而优?

图片 2025-08-19

姿势以及销售业务配置,创造不止考虑到常用者及一时期的优质厂商、概要,讲不止从新的主人公来打动企业市场需求和常用者。从多维度的角度看,元混沌却是是意味著能引为企业和对冲热捧的洞口之一,在这场红利现像下作

央行减至支农支小再贷款利率

综艺 2025-08-19

在年初降准第一时间暂定后,的产品又迎来“定向降息”。中国宝钢再次信贷、再次贴现现金流表显示,自12月初7日起,上调支农、支小再次信贷3个月初、6个月初、1年期现金流0.25个百分比,分别至1.70%、

商业化广告只凭(一):流量与只凭方式

音乐 2025-08-19

流量没有人全部卖出去,这时候应考虑到非常多的是去思考缘故、寻帮忙突破点、适时优化有售思路,而不是急着可选电视广告位,然后面临同样的困境。但如果说是电视广告有售百余人之前翻倍并稳定在设定的

华为AR Engine配备量已达到11亿次 相关产业链或迎爆发

音乐 2025-08-19

在FBEC2021上,智能手机VRAR产品线董事李腾跃透露,智能手机AR Engine配备量已翻倍11亿次,已覆盖机型106款,已接入系统设计超2000款。随着近年来5G的迅速转

友情链接