939:
this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for
Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.
370:, Meta announced Llama 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters. The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models. The accompanying preprint also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.
1046:, allowing systems without a powerful GPU to run the model locally. The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata. The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision.
396:
additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.
631:
___, as an accomplished knight, has a deep understanding of how to kill dragons and how to use each dragon’s weaknesses against it. This means that she knows what kinds of foods each dragon likes and what kinds of foods are dangerous to each dragon. This knowledge and experience will be invaluable as
404:
On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M
1787:
the 8 billion is nearly as powerful as the biggest version of Llama 2 that we released even by the end, it was... still learning right it's like we probably could have fed it more tokens and it would have gotten somewhat better but i mean at some point you know you're running a company you need to
1079:
The release of Llama models has sparked significant debates on the benefits and misuse risks of open weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing
938:
foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy. Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for
395:
Code Llama is a fine-tune of Llama 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the
January 29, 2024. Starting with the foundation models from Llama 2, Meta AI would train an additional 500B tokens of code datasets, before an
1141:
Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Martinet, Xavier; Lachaux, Marie-Anne; Lacroix, Timothée; Rozière, Baptiste; Goyal, Naman; Hambro, Eric; Azhar, Faisal; Rodriguez, Aurelien; Joulin, Armand; Grave, Edouard; Lample, Guillaume (2023). "LLaMA: Open and
Efficient Foundation Language
866:
LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training
420:
During an interview with
Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The
980:
Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like
Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which
288:
license. Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".
1075:
According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future.
215:. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use. Llama models are trained at different parameter sizes, ranging between 7B and 405B. Originally, Llama was only available as a
1001:
to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost. The model files were officially removed on March 21, 2023, over hosting costs and safety concerns, though the code and paper remain online for reference.
965:, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using
346:. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments. Multiple commentators, such as
1080:
damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.
957:
which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.
272:-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens.
628:
I have known ___ for two years, and I believe that she would be an excellent dragon feeder for the Magic
Unicorn Corporation. ___ has an ability to remember and process large amounts of information, which is an important skill for a dragon
1053:
is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for
432:
For the training cost column, only the largest model's cost is written. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec Ă— 1 day = 8.64E19 FLOP.
202:
Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis. Unauthorized copies of the model were shared via
1023:
used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.
1711:
Rozière, Baptiste; Gehring, Jonas; Gloeckle, Fabian; Sootla, Sten; Gat, Itai; Tan, Xiaoqing Ellen; Adi, Yossi; Liu, Jingyu; Sauvestre, Romain (2024-01-31). "Code Llama: Open
Foundation Models for Code".
377:
for chat. In a further departure from LLaMA, all models are released with weights and are free for many commercial use cases. However, due to some remaining restrictions, Meta's description of LLaMA as
2096:
Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). "Self-Instruct: Aligning
Language Models with Self-Generated Instructions".
854:
335:
to remove the
HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests. On March 20, Meta filed a
634:
I am confident that ___’s competence, skill, and experience will make her an excellent employee. Please contact me at (___) ___-___ if you have any questions. I look forward to hearing from you.
945:
consists of mainly
English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.
953:
Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike
2464:
1669:
358:
which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.
2263:
292:
Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware.
1408:
323:
imageboard and subsequently spread through online AI communities. That same day, a pull request on the main LLaMA repository was opened, requesting to add the
1803:
Dubey, Abhimanyu; Jauhri, Abhinav; Pandey, Abhinav; Kadian, Abhishek; Al-Dahle, Ahmad; Letman, Aiesha; Mathur, Akhil; Schelten, Alan; Yang, Amy (2024-07-31),
997:(HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of
339:
takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.
1966:
Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (2021-04-01). "RoFormer: Enhanced Transformer with Rotary Position Embedding".
1072:
The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.
2081:
Taori, Rohan; Gulrajani, Ishaan; Zhang, Tianyi; Dubois, Yann; Li, Xuechen; Guestrin, Carlos; Liang, Percy; Hashimoto, Tatsunori B. (13 March 2023).
981:
concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).
1010:
1734:
1598:
1390:
2401:
966:
250:, a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities. The release of
1548:"Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama"
281:
2453:
873:
foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:
623:
Here is the recommendation letter that I wrote for an application to a dragon feeder position at the Magic Unicorn Corporation:
342:
Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated
1241:
2313:
815:
1336:
1573:"Download weights from hugging face to help us save bandwidth by Jainam213 · Pull Request #109 · facebookresearch/llama"
1427:
932:
version of the LLaMA dataset. The dataset has approximately 1.2 trillion tokens and is publicly available for download.
135:
2154:
284:, architecture, and performance. The inference code used to run the model was publicly released under the open-source
1921:
1286:
1219:
1372:
122:
2384:
2366:
2197:
1522:
2216:
1647:
Touvron, Hugo; Martin, Louis; et al. (18 Jul 2023). "LLaMA-2: Open Foundation and Fine-Tuned Chat Models".
219:. Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models.
2516:
660:
319:
On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the
268:. It was observed that the Llama 3 models showed that when a model is trained on data that is more than the "
115:
1445:
2136:
970:
92:
1670:"Meta launches LLaMA-2, a source-available AI model that allows commercial applications [Updated]"
1354:
674:
374:
296:
269:
66:
2244:
1263:
1409:"ChatGPT is 'not particularly innovative,' and 'nothing revolutionary', says Meta's chief AI scientist"
387:
303:(with 175B parameters), and the largest 65B model was competitive with state of the art models such as
2402:
https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf
2340:
2459:
2055:
1693:
1170:
994:
2521:
2511:
1753:
1014:
410:
928:
On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an
2506:
1897:
405:
human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating
383:
1859:
1069:
describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size.
2452:
2008:
Lei Ba, Jimmy; Kiros, Jamie Ryan; Hinton, Geoffrey E. (2016-07-01). "Layer Normalization".
192:
130:
1017:. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA.
8:
1114:
990:
684:
668:
424:
Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.
414:
355:
413:
3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and
2097:
2009:
1988:
1967:
1946:
1810:
1713:
1648:
1143:
998:
974:
904:
646:
265:
2422:
2029:
1572:
899:
327:
to the official documentation. On March 4, a pull request was opened to add links to
280:
LLaMA was announced on February 24, 2023, via a blog post and a paper describing the
227:
1882:
1547:
2117:
2082:
1066:
406:
351:
332:
254:
and its surprise success caused an increase in attention to large language models.
216:
199:
starting in February 2023. The latest version is Llama 3.1, released in July 2024.
150:
140:
1987:
Zhang, Biao; Sennrich, Rico (2019-10-01). "Root Mean Square Layer Normalization".
1772:
1005:
Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines,
969:(RLHF). A major technical contribution is the departure from the exclusive use of
1446:"Yann LeCun on LinkedIn: My unwavering opinion on current (auto-regressive) LLMs"
1020:
2264:"You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi"
2198:"Meditron: An LLM suite for low-resource medical settings leveraging Meta Llama"
238:
in select regions, and a standalone website. Both services use a Llama 3 model.
2414:
1624:
1050:
921:
347:
223:
189:
1922:"llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models"
1773:"Mark Zuckerberg - Llama 3, Open Sourcing $ 10b Models, & Caesar Augustus"
1171:"Introducing LLaMA: A foundational, 65-billion-parameter large language model"
2500:
2426:
1337:"Meta has a new machine learning language model to remind you it does AI too"
1309:
895:
308:
1694:"Introducing Code Llama, a state-of-the-art large language model for coding"
1493:
2030:"RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset"
1193:
962:
1735:"Meta releases Llama 3, claims it's among the best open models available"
1469:"Introducing Meta Llama 3: The most capable openly available LLM to date"
1468:
1242:"Meta's powerful AI language model has leaked online — what happens now?"
929:
878:
379:
328:
324:
2287:
2172:
2155:"Stanford Researchers Take Down Alpaca AI Over Cost and Hallucinations"
1391:"The inside story of how ChatGPT was built from the people who made it"
1089:
1042:
as open-source on March 10, 2023. It's a re-implementation of LLaMA in
258:
204:
98:
32:
417:, better at coding and reasoning, and to increase its context window.
2230:
1835:
1788:
do these meta reasoning questions of how do I want to spend our GPUs
1039:
1033:
889:
367:
261:
stated that large language models are best for aiding with writing.
257:
Compared with other responses to ChatGPT, Meta's Chief AI scientist
2102:
2014:
1993:
1972:
1951:
1815:
1718:
1653:
1428:"Meta's Yann LeCun on auto-regressive Large Language Models (LLMs)"
1148:
343:
235:
231:
1804:
2367:"Meta's Open Source Llama 3 Is Already Nipping at OpenAI's Heels"
680:
251:
196:
37:
1945:
Shazeer, Noam (2020-02-01). "GLU Variants Improve Transformer".
1625:"Large language models are having their Stable Diffusion moment"
421:
decision was made to end training to focus GPU power elsewhere.
1287:"Meta's AI research head wants open source licensing to change"
1220:"Meta heats up Big Tech's AI arms race with new language model"
1094:
1006:
884:
212:
2341:"Llamafile LLM driver project boosts performance on CPU cores"
1043:
954:
915:
911:
320:
300:
295:
Meta AI reported the 13B parameter model performance on most
285:
247:
20:
2451:
Huang, Kalley; O'Regan, Sylvia Varnham (September 5, 2023).
2415:"Meta's New Llama 3.1 AI Model Is Free, Powerful, and Risky"
1710:
1140:
2454:"Inside Meta's AI Drama: Internal Feuds Over Compute Power"
1310:"Meta and Microsoft Introduce the Next Generation of LLaMA"
1055:
336:
304:
211:
takedown requests against repositories sharing the link on
208:
2489:
2083:"Alpaca: A Strong, Replicable Instruction-Following Model"
331:
repositories containing the model. On March 6, Meta filed
164:
2080:
1754:"Meta debuts third-generation Llama large language model"
618:"recommendation letter for the Magic Unicorn Corporation"
1264:"github/dmca - Notice of Claimed Infringement via Email"
1194:"Introducing Llama 3.1: Our most capable models to date"
659:
Like GPT-3, the Llama series of models are decoder-only
2217:"EPFL's new Large Language Model for Medical Knowledge"
1802:
1599:"Facebook's Powerful Large Language Model Leaks Online"
1373:"Examining Emergent Abilities in Large Language Models"
1013:
School of Computer and Communication Sciences, and the
1009:
papers, and articles. It was created by researchers at
264:
An empirical investigation of the Llama series was the
2137:"Stanford takes costly, risky Alpaca AI model offline"
1355:"Meet Your New Assistant: Meta AI, Built With Llama 3"
849:{\displaystyle \operatorname {RoPE} (\theta =500,000)}
1836:"The Falcon has landed in the Hugging Face ecosystem"
818:
2385:"Meta's amped-up AI agents confusing Facebook users"
2085:. Stanford Center for Research on Foundation Models.
1523:"Meta's LLaMA Leaked to the Public, Thanks To 4chan"
645:– Output of 65 billion parameter LLaMA model before
2048:
246:After the release of large language models such as
2095:
1898:"llama3/MODEL_CARD.md at main · meta-llama/llama3"
848:
2332:
2007:
1885:The model card has some more interesting info too
2498:
2450:
1860:"llama/MODEL_CARD.md at main · meta-llama/llama"
1335:Peters, Jay; Vincent, James (24 February 2023).
1261:
677:(RoPE) instead of absolute positional embedding;
2314:"Quantize Llama models with GGUF and llama.cpp"
1965:
1218:Malik, Yuvraj; Paul, Katie (25 February 2023).
914:source code for scientific papers uploaded to
373:Llama 2 includes foundation models and models
1986:
1646:
1516:
1514:
1463:
1461:
1459:
1334:
1188:
1186:
1184:
883:Open source repositories of source code from
1304:
1302:
1300:
1257:
1255:
1235:
1233:
1231:
1229:
1115:"llama3/LICENSE at main · meta-llama/llama3"
1038:Software developer Georgi Gerganov released
612:
299:benchmarks exceeded that of the much larger
2165:
2022:
1642:
1640:
1638:
1511:
1456:
1181:
973:(PPO) for RLHF – a new technique based on
967:Reinforcement learning from human feedback
2101:
2013:
1992:
1971:
1950:
1814:
1717:
1652:
1425:
1297:
1252:
1226:
1217:
1147:
1136:
1134:
1132:
1130:
1128:
2214:
1622:
1616:
1011:École Polytechnique Fédérale de Lausanne
663:, but there are some minor differences:
2311:
2261:
1944:
1732:
1667:
1635:
1239:
427:
2499:
2467:from the original on September 5, 2023
1165:
1163:
1161:
1159:
1125:
995:Human-Centered Artificial Intelligence
366:On July 18, 2023, in partnership with
2338:
1798:
1796:
1770:
1540:
1284:
2312:Labonne, Maxime (29 November 2023).
2245:"How Companies Are Using Meta Llama"
2215:Petersen, Tanya (28 November 2023).
1751:
1486:
861:
1596:
1426:Badminton, Nik (13 February 2023).
1156:
13:
2443:
2412:
2364:
1828:
1793:
1520:
1262:OpSec Online LLC (21 March 2023).
275:
222:Alongside the release of Llama 3,
14:
2533:
2481:
2134:
1623:Willison, Simon (11 March 2023).
1285:David, Emilia (30 October 2023).
692:key hyperparameters of Llama 3.1
72:Llama 3.1 / July 23, 2024
1883:Andrej Karpathy (Apr 18, 2024),
1058:and 8-bit quantized data types.
2406:
2395:
2377:
2358:
2305:
2280:
2255:
2237:
2223:
2208:
2190:
2147:
2128:
2110:
2089:
2074:
2001:
1980:
1959:
1938:
1914:
1890:
1876:
1852:
1764:
1752:Mann, Tobias (April 19, 2024).
1745:
1733:Wiggers, Kyle (18 April 2024).
1726:
1704:
1686:
1661:
1590:
1565:
1438:
1419:
1401:
1383:
1365:
1347:
1328:
1240:Vincent, James (8 March 2023).
984:
654:
16:Large language model by Meta AI
1771:Patel, Dwarkesh (2024-07-24).
1278:
1211:
1107:
948:
843:
825:
207:. In response, Meta AI issued
155:Meta Llama 3 Community License
1:
1100:
241:
2262:Edwards, Benj (2023-03-13).
1668:Edwards, Benj (2023-07-18).
1597:Cox, Joseph (7 March 2023).
1521:VK, Anirudh (6 March 2023).
1061:
1027:
971:Proximal Policy Optimization
675:rotary positional embeddings
649:, given the prompt (in bold)
448:Training cost (petaFLOP-day)
182:Large Language Model Meta AI
7:
1083:
977:was used, followed by PPO.
920:Questions and answers from
386:(known for maintaining the
10:
2538:
1806:The Llama 3 Herd of Models
1031:
399:
361:
18:
812:
804:
796:
613:Architecture and training
578:
525:
522:
519:
382:has been disputed by the
159:
149:
121:
111:
91:
87:
65:
61:
43:
31:
1527:Analytics India Magazine
19:Not to be confused with
1629:Simon Willison's Weblog
1015:Yale School of Medicine
314:
184:, formerly stylized as
49:; 18 months ago
2320:. Towards Data Science
2233:. epfLLM. 11 May 2024.
850:
810:Positional Embeddings
642:
632:she feeds the dragons.
457:Commercial viability?
389:Open Source Definition
384:Open Source Initiative
47:February 24, 2023
2517:Large language models
1777:www.dwarkeshpatel.com
1395:MIT Technology Review
1049:llamafile created by
851:
620:
193:large language models
74:; 60 days ago
2339:Connatser, Matthew.
1379:. 13 September 2022.
877:Webpages scraped by
816:
794:Activation Function
428:Comparison of models
350:, compared LLaMA to
131:Large language model
2345:www.theregister.com
2141:www.theregister.com
2056:"RedPajama-Data-1T"
1177:. 24 February 2023.
991:Stanford University
780:Peak Learning Rate
693:
685:layer normalization
669:activation function
356:text-to-image model
195:(LLMs) released by
28:
999:instruction tuning
975:Rejection sampling
846:
691:
647:instruction tuning
465:February 24, 2023
26:
2231:"epfLLM/meditron"
2135:Quach, Katyanna.
2122:crfm.stanford.edu
900:Project Gutenberg
862:Training datasets
859:
858:
610:
609:
333:takedown requests
228:virtual assistant
188:) is a family of
175:
174:
2529:
2493:
2492:
2490:Official website
2476:
2474:
2472:
2456:
2437:
2436:
2434:
2433:
2410:
2404:
2399:
2393:
2392:
2391:. 19 April 2024.
2381:
2375:
2374:
2362:
2356:
2355:
2353:
2351:
2336:
2330:
2329:
2327:
2325:
2309:
2303:
2302:
2300:
2298:
2284:
2278:
2277:
2275:
2274:
2259:
2253:
2252:
2241:
2235:
2234:
2227:
2221:
2220:
2212:
2206:
2205:
2194:
2188:
2187:
2185:
2183:
2169:
2163:
2162:
2161:. 21 March 2023.
2151:
2145:
2144:
2132:
2126:
2125:
2114:
2108:
2107:
2105:
2093:
2087:
2086:
2078:
2072:
2071:
2069:
2067:
2052:
2046:
2045:
2043:
2041:
2026:
2020:
2019:
2017:
2005:
1999:
1998:
1996:
1984:
1978:
1977:
1975:
1963:
1957:
1956:
1954:
1942:
1936:
1935:
1933:
1932:
1918:
1912:
1911:
1909:
1908:
1894:
1888:
1880:
1874:
1873:
1871:
1870:
1856:
1850:
1849:
1847:
1846:
1832:
1826:
1825:
1824:
1823:
1818:
1800:
1791:
1790:
1784:
1783:
1768:
1762:
1761:
1749:
1743:
1742:
1730:
1724:
1723:
1721:
1708:
1702:
1701:
1690:
1684:
1683:
1681:
1680:
1665:
1659:
1658:
1656:
1644:
1633:
1632:
1620:
1614:
1613:
1611:
1609:
1594:
1588:
1587:
1585:
1583:
1569:
1563:
1562:
1560:
1558:
1544:
1538:
1537:
1535:
1533:
1518:
1509:
1508:
1506:
1504:
1490:
1484:
1483:
1481:
1480:
1475:. April 18, 2024
1465:
1454:
1453:
1450:www.linkedin.com
1442:
1436:
1435:
1423:
1417:
1416:
1405:
1399:
1398:
1387:
1381:
1380:
1377:hai.stanford.edu
1369:
1363:
1362:
1361:. 18 April 2024.
1351:
1345:
1344:
1332:
1326:
1325:
1323:
1321:
1306:
1295:
1294:
1282:
1276:
1275:
1273:
1271:
1259:
1250:
1249:
1237:
1224:
1223:
1215:
1209:
1208:
1206:
1205:
1190:
1179:
1178:
1167:
1154:
1153:
1151:
1138:
1123:
1122:
1111:
855:
853:
852:
847:
802:Vocabulary Size
766:Key/Value Heads
752:Attention Heads
724:Model Dimension
694:
690:
671:instead of GeLU;
650:
638:Honorable Knight
534:August 24, 2023
436:
435:
352:Stable Diffusion
282:model's training
217:foundation model
171:
168:
166:
141:Foundation model
107:
104:
102:
100:
82:
80:
75:
57:
55:
50:
29:
25:
2537:
2536:
2532:
2531:
2530:
2528:
2527:
2526:
2497:
2496:
2488:
2487:
2484:
2479:
2470:
2468:
2460:The Information
2446:
2444:Further reading
2441:
2440:
2431:
2429:
2411:
2407:
2400:
2396:
2383:
2382:
2378:
2363:
2359:
2349:
2347:
2337:
2333:
2323:
2321:
2310:
2306:
2296:
2294:
2286:
2285:
2281:
2272:
2270:
2260:
2256:
2243:
2242:
2238:
2229:
2228:
2224:
2213:
2209:
2196:
2195:
2191:
2181:
2179:
2171:
2170:
2166:
2153:
2152:
2148:
2133:
2129:
2118:"Stanford CRFM"
2116:
2115:
2111:
2094:
2090:
2079:
2075:
2065:
2063:
2054:
2053:
2049:
2039:
2037:
2028:
2027:
2023:
2006:
2002:
1985:
1981:
1964:
1960:
1943:
1939:
1930:
1928:
1920:
1919:
1915:
1906:
1904:
1896:
1895:
1891:
1881:
1877:
1868:
1866:
1858:
1857:
1853:
1844:
1842:
1834:
1833:
1829:
1821:
1819:
1801:
1794:
1781:
1779:
1769:
1765:
1750:
1746:
1731:
1727:
1709:
1705:
1692:
1691:
1687:
1678:
1676:
1666:
1662:
1645:
1636:
1621:
1617:
1607:
1605:
1595:
1591:
1581:
1579:
1571:
1570:
1566:
1556:
1554:
1546:
1545:
1541:
1531:
1529:
1519:
1512:
1502:
1500:
1492:
1491:
1487:
1478:
1476:
1467:
1466:
1457:
1444:
1443:
1439:
1424:
1420:
1407:
1406:
1402:
1389:
1388:
1384:
1371:
1370:
1366:
1353:
1352:
1348:
1333:
1329:
1319:
1317:
1308:
1307:
1298:
1283:
1279:
1269:
1267:
1260:
1253:
1238:
1227:
1216:
1212:
1203:
1201:
1200:. July 23, 2024
1192:
1191:
1182:
1169:
1168:
1157:
1139:
1126:
1113:
1112:
1108:
1103:
1086:
1064:
1036:
1030:
987:
951:
892:in 20 languages
864:
817:
814:
813:
657:
652:
644:
639:
637:
635:
633:
630:
627:
625:Dear recruiter,
624:
619:
615:
560:April 18, 2024
430:
402:
364:
317:
278:
276:Initial release
244:
163:
145:
97:
83:
78:
76:
73:
53:
51:
48:
44:Initial release
24:
17:
12:
11:
5:
2535:
2525:
2524:
2522:Meta Platforms
2519:
2514:
2512:Internet leaks
2509:
2495:
2494:
2483:
2482:External links
2480:
2478:
2477:
2447:
2445:
2442:
2439:
2438:
2413:Knight, Will.
2405:
2394:
2376:
2365:Knight, Will.
2357:
2331:
2304:
2292:huggingface.co
2279:
2254:
2236:
2222:
2207:
2189:
2164:
2146:
2127:
2109:
2088:
2073:
2047:
2021:
2000:
1979:
1958:
1937:
1913:
1889:
1875:
1851:
1840:huggingface.co
1827:
1792:
1763:
1744:
1725:
1703:
1685:
1660:
1634:
1615:
1589:
1564:
1539:
1510:
1485:
1455:
1437:
1418:
1400:
1382:
1364:
1346:
1327:
1316:. 18 July 2023
1296:
1277:
1251:
1225:
1210:
1180:
1155:
1124:
1105:
1104:
1102:
1099:
1098:
1097:
1092:
1085:
1082:
1063:
1060:
1051:Justine Tunney
1032:Main article:
1029:
1026:
993:Institute for
986:
983:
950:
947:
926:
925:
922:Stack Exchange
918:
908:
902:
893:
887:
881:
863:
860:
857:
856:
845:
842:
839:
836:
833:
830:
827:
824:
821:
811:
807:
806:
803:
799:
798:
795:
791:
790:
787:
784:
781:
777:
776:
773:
770:
767:
763:
762:
759:
756:
753:
749:
748:
745:
742:
739:
738:FFN Dimension
735:
734:
731:
728:
725:
721:
720:
717:
714:
711:
707:
706:
703:
700:
697:
689:
688:
678:
672:
656:
653:
617:
616:
614:
611:
608:
607:
604:
601:
600:
599:
596:
593:
588:
587:July 23, 2024
585:
581:
580:
577:
574:
571:
570:
569:
566:
561:
558:
554:
553:
551:
550:
549:
546:
543:
540:
535:
532:
528:
527:
524:
521:
518:
515:
514:
513:
510:
507:
502:
501:July 18, 2023
499:
495:
494:
491:
488:
485:
482:
481:
480:
477:
474:
471:
466:
463:
459:
458:
455:
452:
451:Context length
449:
446:
443:
440:
429:
426:
401:
398:
363:
360:
348:Simon Willison
316:
313:
277:
274:
243:
240:
190:autoregressive
173:
172:
161:
157:
156:
153:
147:
146:
144:
143:
138:
133:
127:
125:
119:
118:
113:
109:
108:
95:
89:
88:
85:
84:
71:
69:
67:Stable release
63:
62:
59:
58:
45:
41:
40:
35:
15:
9:
6:
4:
3:
2:
2534:
2523:
2520:
2518:
2515:
2513:
2510:
2508:
2507:2023 software
2505:
2504:
2502:
2491:
2486:
2485:
2466:
2462:
2461:
2455:
2449:
2448:
2428:
2424:
2420:
2416:
2409:
2403:
2398:
2390:
2386:
2380:
2372:
2368:
2361:
2346:
2342:
2335:
2319:
2315:
2308:
2293:
2289:
2283:
2269:
2265:
2258:
2251:. 7 May 2024.
2250:
2246:
2240:
2232:
2226:
2218:
2211:
2203:
2199:
2193:
2178:
2174:
2173:"alpaca-lora"
2168:
2160:
2156:
2150:
2142:
2138:
2131:
2123:
2119:
2113:
2104:
2099:
2092:
2084:
2077:
2061:
2057:
2051:
2035:
2031:
2025:
2016:
2011:
2004:
1995:
1990:
1983:
1974:
1969:
1962:
1953:
1948:
1941:
1927:
1923:
1917:
1903:
1899:
1893:
1887:
1886:
1879:
1865:
1861:
1855:
1841:
1837:
1831:
1817:
1812:
1808:
1807:
1799:
1797:
1789:
1778:
1774:
1767:
1759:
1755:
1748:
1740:
1736:
1729:
1720:
1715:
1707:
1699:
1695:
1689:
1675:
1671:
1664:
1655:
1650:
1643:
1641:
1639:
1630:
1626:
1619:
1604:
1600:
1593:
1578:
1574:
1568:
1553:
1549:
1543:
1528:
1524:
1517:
1515:
1499:
1495:
1489:
1474:
1470:
1464:
1462:
1460:
1451:
1447:
1441:
1433:
1429:
1422:
1414:
1410:
1404:
1396:
1392:
1386:
1378:
1374:
1368:
1360:
1356:
1350:
1342:
1338:
1331:
1315:
1311:
1305:
1303:
1301:
1292:
1288:
1281:
1265:
1258:
1256:
1247:
1243:
1236:
1234:
1232:
1230:
1221:
1214:
1199:
1195:
1189:
1187:
1185:
1176:
1172:
1166:
1164:
1162:
1160:
1150:
1145:
1137:
1135:
1133:
1131:
1129:
1120:
1116:
1110:
1106:
1096:
1093:
1091:
1088:
1087:
1081:
1077:
1073:
1070:
1068:
1059:
1057:
1052:
1047:
1045:
1041:
1035:
1025:
1022:
1018:
1016:
1012:
1008:
1003:
1000:
996:
992:
982:
978:
976:
972:
968:
964:
959:
956:
946:
944:
940:
937:
933:
931:
923:
919:
917:
913:
909:
907:books dataset
906:
903:
901:
897:
896:Public domain
894:
891:
888:
886:
882:
880:
876:
875:
874:
872:
868:
840:
837:
834:
831:
828:
822:
819:
809:
808:
801:
800:
793:
792:
788:
785:
782:
779:
778:
774:
771:
768:
765:
764:
760:
757:
754:
751:
750:
746:
743:
740:
737:
736:
732:
729:
726:
723:
722:
718:
715:
712:
709:
708:
704:
701:
698:
696:
695:
686:
682:
679:
676:
673:
670:
666:
665:
664:
662:
651:
648:
641:
636:Best regards,
626:
605:
602:
597:
594:
591:
590:
589:
586:
583:
582:
575:
572:
567:
564:
563:
562:
559:
556:
555:
552:
547:
544:
541:
538:
537:
536:
533:
530:
529:
516:
511:
508:
505:
504:
503:
500:
497:
496:
492:
489:
486:
483:
478:
475:
472:
469:
468:
467:
464:
461:
460:
456:
453:
450:
447:
444:
441:
438:
437:
434:
425:
422:
418:
416:
412:
408:
397:
393:
391:
390:
385:
381:
376:
371:
369:
359:
357:
353:
349:
345:
340:
338:
334:
330:
326:
322:
312:
310:
306:
302:
298:
293:
290:
287:
283:
273:
271:
267:
262:
260:
255:
253:
249:
239:
237:
233:
229:
225:
220:
218:
214:
210:
206:
200:
198:
194:
191:
187:
183:
179:
170:
162:
158:
154:
152:
148:
142:
139:
137:
134:
132:
129:
128:
126:
124:
120:
117:
114:
110:
106:
96:
94:
90:
86:
70:
68:
64:
60:
46:
42:
39:
36:
34:
30:
22:
2471:September 6,
2469:. Retrieved
2458:
2430:. Retrieved
2418:
2408:
2397:
2388:
2379:
2370:
2360:
2348:. Retrieved
2344:
2334:
2322:. Retrieved
2317:
2307:
2295:. Retrieved
2291:
2282:
2271:. Retrieved
2268:Ars Technica
2267:
2257:
2248:
2239:
2225:
2210:
2201:
2192:
2180:. Retrieved
2176:
2167:
2158:
2149:
2140:
2130:
2121:
2112:
2091:
2076:
2064:. Retrieved
2060:Hugging Face
2059:
2050:
2038:. Retrieved
2033:
2024:
2003:
1982:
1961:
1940:
1929:. Retrieved
1925:
1916:
1905:. Retrieved
1901:
1892:
1884:
1878:
1867:. Retrieved
1863:
1854:
1843:. Retrieved
1839:
1830:
1820:, retrieved
1805:
1786:
1780:. Retrieved
1776:
1766:
1758:The Register
1757:
1747:
1738:
1728:
1706:
1697:
1688:
1677:. Retrieved
1674:Ars Technica
1673:
1663:
1628:
1618:
1606:. Retrieved
1602:
1592:
1580:. Retrieved
1576:
1567:
1555:. Retrieved
1551:
1542:
1530:. Retrieved
1526:
1501:. Retrieved
1497:
1488:
1477:. Retrieved
1472:
1449:
1440:
1432:Futurist.com
1431:
1421:
1412:
1403:
1394:
1385:
1376:
1367:
1358:
1349:
1340:
1330:
1318:. Retrieved
1313:
1290:
1280:
1268:. Retrieved
1245:
1213:
1202:. Retrieved
1197:
1174:
1118:
1109:
1078:
1074:
1071:
1065:
1048:
1037:
1019:
1004:
988:
985:Applications
979:
963:AI alignment
960:
952:
942:
941:
935:
934:
927:
870:
869:
865:
661:Transformers
658:
655:Architecture
643:
622:
621:
442:Release date
431:
423:
419:
409:pro 1.5 and
403:
394:
388:
372:
365:
341:
318:
294:
291:
279:
266:scaling laws
263:
256:
245:
230:features to
221:
201:
185:
181:
177:
176:
33:Developer(s)
2202:ai.meta.com
1698:ai.meta.com
1473:ai.meta.com
1198:ai.meta.com
949:Fine-tuning
930:open source
898:books from
879:CommonCrawl
683:instead of
640:Sir George
531:Code Llama
454:Corpus size
445:Parameters
380:open source
329:HuggingFace
325:magnet link
103:/meta-llama
2501:Categories
2432:2024-08-04
2273:2024-01-04
2103:2212.10560
2062:. Together
2036:. Together
2015:1607.06450
1994:1910.07467
1973:2104.09864
1952:2002.05202
1931:2024-07-23
1907:2024-05-28
1869:2024-05-28
1845:2023-06-20
1822:2024-08-08
1816:2407.21783
1782:2024-08-01
1739:TechCrunch
1719:2308.12950
1679:2023-08-08
1654:2307.09288
1479:2024-04-21
1222:. Reuters.
1204:2024-07-23
1149:2302.13971
1101:References
1090:Mistral AI
584:Llama 3.1
415:multimodal
375:fine-tuned
309:Chinchilla
270:Chinchilla
259:Yann LeCun
242:Background
205:BitTorrent
112:Written in
93:Repository
79:2024-07-23
54:2023-02-24
2427:1059-1028
1341:The Verge
1291:The Verge
1246:The Verge
1142:Models".
1062:Reception
1040:llama.cpp
1034:llama.cpp
1028:llama.cpp
890:Knowledge
867:process.
829:θ
823:
789:0.8 Ă— 10
786:1.5 Ă— 10
368:Microsoft
2465:Archived
2389:ABC News
1608:17 March
1582:17 March
1557:25 March
1532:17 March
1503:16 March
1270:25 March
1266:. GitHub
1084:See also
924:websites
805:128,000
606:128,000
603:440,000
573:100,000
557:Llama 3
498:Llama 2
236:WhatsApp
232:Facebook
2182:5 April
2159:Gizmodo
1494:"llama"
1320:21 July
1175:Meta AI
943:Llama 3
936:Llama 2
871:LLaMA 1
797:SwiGLU
783:3 Ă— 10
747:53,248
744:28,672
741:14,336
733:16,384
710:Layers
681:RMSNorm
667:SwiGLU
629:feeder.
517:21,000
490:1–1.4T
400:Llama 3
362:Llama 2
252:ChatGPT
197:Meta AI
160:Website
151:License
105:/llama3
77: (
52: (
38:Meta AI
2425:
2350:10 May
2318:Medium
2288:"GGUF"
2177:GitHub
2034:GitHub
1926:GitHub
1902:GitHub
1864:GitHub
1577:GitHub
1552:GitHub
1498:GitHub
1119:GitHub
1095:GPT-4o
1007:PubMed
905:Books3
885:GitHub
727:4,096
484:6,300
462:LLaMA
411:Claude
407:Gemini
226:added
213:GitHub
116:Python
99:github
2419:Wired
2371:Wired
2324:9 May
2297:9 May
2098:arXiv
2066:4 May
2040:4 May
2010:arXiv
1989:arXiv
1968:arXiv
1947:arXiv
1811:arXiv
1714:arXiv
1649:arXiv
1413:ZDNET
1144:arXiv
1067:Wired
955:GPT-4
916:ArXiv
912:LaTeX
730:8192
705:405B
595:70.6B
576:8192
568:70.6B
545:33.7B
520:4096
487:2048
479:65.2B
476:32.5B
321:4chan
301:GPT-3
286:GPLv3
248:GPT-3
186:LLaMA
178:Llama
167:.meta
165:llama
27:Llama
21:LaMDA
2473:2023
2423:ISSN
2352:2024
2326:2024
2299:2024
2249:Meta
2184:2023
2068:2023
2042:2023
1610:2023
1603:Vice
1584:2023
1559:2023
1534:2023
1505:2023
1359:Meta
1322:2023
1314:Meta
1272:2023
1056:FP16
1021:Zoom
989:The
961:For
910:The
820:RoPE
761:128
719:126
702:70B
598:405B
579:15T
539:6.7B
526:Yes
506:6.7B
470:6.7B
439:Name
354:, a
344:spam
337:DMCA
315:Leak
307:and
305:PaLM
234:and
224:Meta
209:DMCA
169:.com
123:Type
101:.com
1044:C++
841:000
835:500
758:64
755:32
716:80
713:32
699:8B
548:69B
542:13B
523:2T
512:69B
509:13B
493:No
473:13B
392:).
297:NLP
136:GPT
2503::
2463:.
2457:.
2421:.
2417:.
2387:.
2369:.
2343:.
2316:.
2290:.
2266:.
2247:.
2200:.
2175:.
2157:.
2139:.
2120:.
2058:.
2032:.
1924:.
1900:.
1862:.
1838:.
1809:,
1795:^
1785:.
1775:.
1756:.
1737:.
1696:.
1672:.
1637:^
1627:.
1601:.
1575:.
1550:.
1525:.
1513:^
1496:.
1471:.
1458:^
1448:.
1430:.
1411:.
1393:.
1375:.
1357:.
1339:.
1312:.
1299:^
1289:.
1254:^
1244:.
1228:^
1196:.
1183:^
1173:.
1158:^
1127:^
1117:.
775:8
772:8
769:8
592:8B
565:8B
311:.
2475:.
2435:.
2373:.
2354:.
2328:.
2301:.
2276:.
2219:.
2204:.
2186:.
2143:.
2124:.
2106:.
2100::
2070:.
2044:.
2018:.
2012::
1997:.
1991::
1976:.
1970::
1955:.
1949::
1934:.
1910:.
1872:.
1848:.
1813::
1760:.
1741:.
1722:.
1716::
1700:.
1682:.
1657:.
1651::
1631:.
1612:.
1586:.
1561:.
1536:.
1507:.
1482:.
1452:.
1434:.
1415:.
1397:.
1343:.
1324:.
1293:.
1274:.
1248:.
1207:.
1152:.
1146::
1121:.
844:)
838:,
832:=
826:(
687:;
180:(
81:)
56:)
23:.
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.