【Fulwa (2025) Hindi Short Film】

2025-06-26 03:31:29 [Digital Culture] Source: Opportunity Information Network

DeepSeek has released a new paper,Fulwa (2025) Hindi Short Film with co-founder Liang Wenfeng credited as a contributor, detailing how its latest large language model DeepSeek-V3 achieves efficient training and inference using only 2,048 H800 GPUs – significantly fewer than the tens of thousands typically required. The team attributes this efficiency to four key innovations: memory optimization through multi-head latent attention (MLA), computational savings via a Mixture-of-Experts (MoE) design with FP8 precision, communication improvements using a multi-plane network topology, and faster inference through multi-token prediction (MTP). With MLA, KV cache memory usage is cut to just 70KB per token, up to 1/7 that of competing models. MoE architecture activates only 37 billion of the model’s 671 billion parameters per forward pass, reducing training costs by 90% compared to dense models. FP8 training further halves compute and memory usage, with minimal accuracy tradeoff. Beyond the model, the paper also outlines five future directions for AI hardware design, advocating for tighter integration between software and hardware to address memory, compute, and networking bottlenecks. [36Kr, in Chinese]

(Editor: {typename type="name"/})

[1]

Recommended

The Sound and the “Furious”

Bread and Circuses ...[Details]
Watch Duty: Track the Los Angeles wildfires with this free app

If you are in Los Angeles or have loved ones who are, there is an app you can use to track the progr ...[Details]
Best iPad deal: Save $70 on 10th Gen Apple iPad

Save $70: As of Jan. 10, the Apple iPad (10th Gen) is available for $279 at Amazon, saving you $70 o ...[Details]
CES 2025: When you'll be able to buy the new Lenovo Legion Go S handheld

Welcome back to the incredible week of tech announcements at CES. We now have a robot vacuum with an ...[Details]
Outdoor speaker deal: Save $20 on the Soundcore Boom 2

SAVE $20: As of May 13, Anker's Soundcore Boom 2 speaker is on sale for $119.99 instead of $139.99 a ...[Details]
Best Target Circle 360 deal: How to get free gift

FREE ITEM:As of Jan. 9, Target Circle 360 members can score a selected free item. This offer ends on ...[Details]
X announces labels for parody accounts

Parody accounts on X (formerly Twitter) are about to become more obvious.SEE ALSO: ...[Details]
CES 2025: The Sleepwave For Me Buds track your brainwaves while you sleep

Sleep is perhaps one of the most important aspects of maintaining good physical and mental health, b ...[Details]
NYT mini crossword answers for May 12, 2025

The Mini is a bite-sized version of The New York Times' revered daily crossword. While the crossword ...[Details]
Help, I can't stop thinking about Suzie Toot's 'Woman's World' lip sync

A great lip sync on RuPaul's Drag Race can forever alter your perception of a song.I can't listen to ...[Details]

Hot Reads

Random

【Fulwa (2025) Hindi Short Film】

友情链接