Llama (language model)

939:

this project, which performed better than larger but lower-quality third-party datasets. For AI alignment, reinforcement learning with human feedback (RLHF) was used with a combination of 1,418,091 Meta examples and seven smaller datasets. The average dialog depth was 3.9 in the Meta examples, 3.0 for Anthropic Helpful and Anthropic Harmless sets, and 1.0 for five other sets, including OpenAI Summarize, StackExchange, etc.

370:, Meta announced Llama 2, the next generation of Llama. Meta trained and released Llama 2 in three model sizes: 7, 13, and 70 billion parameters. The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models. The accompanying preprint also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets. 1046:, allowing systems without a powerful GPU to run the model locally. The llama.cpp project introduced the GGUF file format, a binary format that stores both tensors and metadata. The format focuses on supporting different quantization types, which can reduce memory usage, and increase speed at the expense of lower model precision. 396:

additional 20B token of long-context data, creating the Code Llama foundation models. This foundation model was further trained on 5B instruction following token to create the instruct fine-tune. Another foundation model was created for Python code, which trained on 100B tokens of Python-only code, before the long-context data.

631:

___, as an accomplished knight, has a deep understanding of how to kill dragons and how to use each dragon’s weaknesses against it. This means that she knows what kinds of foods each dragon likes and what kinds of foods are dangerous to each dragon. This knowledge and experience will be invaluable as

404:

On April 18, 2024, Meta released Llama-3 with two sizes: 8B and 70B parameters. The models have been pre-trained on approximately 15 trillion tokens of text gathered from “publicly available sources” with the instruct models fine-tuned on “publicly available instruction datasets, as well as over 10M

1787:

the 8 billion is nearly as powerful as the biggest version of Llama 2 that we released even by the end, it was... still learning right it's like we probably could have fed it more tokens and it would have gotten somewhat better but i mean at some point you know you're running a company you need to

1079:

The release of Llama models has sparked significant debates on the benefits and misuse risks of open weight models. Such models can be fine-tuned to remove safeguards, notably by cyber criminals, until they comply with harmful requests. Some experts contend that future models may facilitate causing

938:

foundational models were trained on a data set with 2 trillion tokens. This data set was curated to remove Web sites that often disclose personal data of people. It also upsamples sources considered trustworthy. Llama 2 - Chat was additionally fine-tuned on 27,540 prompt-response pairs created for

395:

Code Llama is a fine-tune of Llama 2 with code specific datasets. 7B, 13B, and 34B versions were released on August 24, 2023, with the 70B releasing on the January 29, 2024. Starting with the foundation models from Llama 2, Meta AI would train an additional 500B tokens of code datasets, before an

1141:

Touvron, Hugo; Lavril, Thibaut; Izacard, Gautier; Martinet, Xavier; Lachaux, Marie-Anne; Lacroix, Timothée; Rozière, Baptiste; Goyal, Naman; Hambro, Eric; Azhar, Faisal; Rodriguez, Aurelien; Joulin, Armand; Grave, Edouard; Lample, Guillaume (2023). "LLaMA: Open and Efficient Foundation Language

866:

LLaMA's developers focused their effort on scaling the model's performance by increasing the volume of training data, rather than the number of parameters, reasoning that the dominating cost for LLMs is from doing inference on the trained model rather than the computational cost of the training

420:

During an interview with Dwarkesh Patel, Mark Zuckerberg said that the 8B version of Llama 3 was nearly as powerful as the largest Llama 2. Compared to previous models, Zuckerberg stated the team was surprised that the 70B model was still learning even at the end of the 15T tokens training. The

980:

Multi-turn consistency in dialogs was targeted for improvement, to make sure that "system messages" (initial instructions, such as "speak in French" and "act like Napoleon") are respected during the dialog. This was accomplished using the new "Ghost attention" technique during training, which

288:

license. Access to the model's weights was managed by an application process, with access to be granted "on a case-by-case basis to academic researchers; those affiliated with organizations in government, civil society, and academia; and industry research laboratories around the world".

1075:

According to the Q4 2023 Earnings transcript, Meta adopted the strategy of open weights to improve on model safety, iteration speed, increase adoption among developers and researchers, and to become the industry standard. Llama 5, 6, and 7 are planned for the future.

215:. Subsequent versions of Llama were made accessible outside academia and released under licenses that permitted some commercial use. Llama models are trained at different parameter sizes, ranging between 7B and 405B. Originally, Llama was only available as a 1001:

to acquire capabilities comparable to the OpenAI GPT-3 series text-davinci-003 model at a modest cost. The model files were officially removed on March 21, 2023, over hosting costs and safety concerns, though the code and paper remain online for reference.

965:, human annotators wrote prompts and then compared two model outputs (a binary protocol), giving confidence levels and separate safety labels with veto power. Two separate reward models were trained from these preferences for safety and helpfulness using 346:. Some have celebrated the model's accessibility, as well as the fact that smaller versions of the model can be run relatively cheaply, suggesting that this will promote the flourishing of additional research developments. Multiple commentators, such as 1080:

damage more than defending against it, for example by making it relatively easy to engineer advanced bioweapons without specialized knowledge. Conversely, open-weight models can be useful for a wide variety of purposes, including for safety research.

957:

which increased context length during fine-tuning, Llama 2 and Code Llama - Chat have the same context length of 4K tokens. Supervised fine-tuning used an autoregressive loss function with token loss on user prompts zeroed out. The batch size was 64.

272:-optimal" amount, the performance continues to scale log-linearly. For example, the Chinchilla-optimal dataset for Llama 3 8B is 200 billion tokens, but performance continued to scale log-linearly to the 75-times larger dataset of 15 trillion tokens. 628:

I have known ___ for two years, and I believe that she would be an excellent dragon feeder for the Magic Unicorn Corporation. ___ has an ability to remember and process large amounts of information, which is an important skill for a dragon

1053:

is an open-source tool that bundles llama.cpp with the model into a single executable file. Tunney et al. introduced new optimized matrix multiplication kernels for x86 and ARM CPUs, improving prompt evaluation performance for

432:

For the training cost column, only the largest model's cost is written. So for example, "21,000" is the training cost of Llama 2 69B in units of petaFLOP-day. Also, 1 petaFLOP-day = 1 petaFLOP/sec × 1 day = 8.64E19 FLOP.

202:

Model weights for the first version of Llama were made available to the research community under a non-commercial license, and access was granted on a case-by-case basis. Unauthorized copies of the model were shared via

1023:

used Meta Llama 2 to create an AI Companion that can summarize meetings, provide helpful presentation tips, and assist with message responses. This AI Companion is powered by multiple models, including Meta Llama 2.

1711:

Rozière, Baptiste; Gehring, Jonas; Gloeckle, Fabian; Sootla, Sten; Gat, Itai; Tan, Xiaoqing Ellen; Adi, Yossi; Liu, Jingyu; Sauvestre, Romain (2024-01-31). "Code Llama: Open Foundation Models for Code".

377:

for chat. In a further departure from LLaMA, all models are released with weights and are free for many commercial use cases. However, due to some remaining restrictions, Meta's description of LLaMA as

2096:

Wang, Yizhong; Kordi, Yeganeh; Mishra, Swaroop; Liu, Alisa; Smith, Noah A.; Khashabi, Daniel; Hajishirzi, Hannaneh (2022). "Self-Instruct: Aligning Language Models with Self-Generated Instructions".

854: 335:

to remove the HuggingFace repositories linked in the pull request, characterizing it as "unauthorized distribution" of the model. HuggingFace complied with the requests. On March 20, Meta filed a

634:

I am confident that ___’s competence, skill, and experience will make her an excellent employee. Please contact me at (___) ___-___ if you have any questions. I look forward to hearing from you.

945:

consists of mainly English data, with over 5% in over 30 other languages. Its dataset was filtered by a text-quality classifier, and the classifier was trained by text synthesized by Llama 2.

953:

Llama 1 models are only available as foundational models with self-supervised learning and without fine-tuning. Llama 2 – Chat models were derived from foundational Llama 2 models. Unlike

2464: 1669: 358:

which, unlike comparably sophisticated models which preceded it, was openly distributed, leading to a rapid proliferation of associated tools, techniques, and software.

2263: 292:

Llama was trained on only publicly available information, and was trained at various model sizes, with the intention to make it more accessible to different hardware.

1408: 323:

imageboard and subsequently spread through online AI communities. That same day, a pull request on the main LLaMA repository was opened, requesting to add the

1803:

Dubey, Abhimanyu; Jauhri, Abhinav; Pandey, Abhinav; Kadian, Abhishek; Al-Dahle, Ahmad; Letman, Aiesha; Mathur, Akhil; Schelten, Alan; Yang, Amy (2024-07-31),

997:(HAI) Center for Research on Foundation Models (CRFM) released Alpaca, a training recipe based on the LLaMA 7B model that uses the "Self-Instruct" method of 339:

takedown request for copyright infringement against a repository containing a script that downloaded LLaMA from a mirror, and GitHub complied the next day.

1966:

Su, Jianlin; Lu, Yu; Pan, Shengfeng; Murtadha, Ahmed; Wen, Bo; Liu, Yunfeng (2021-04-01). "RoFormer: Enhanced Transformer with Rotary Position Embedding".

1072:

The response to Meta's integration of Llama into Facebook was mixed, with some users confused after Meta AI told a parental group that it had a child.

2081:

Taori, Rohan; Gulrajani, Ishaan; Zhang, Tianyi; Dubois, Yann; Li, Xuechen; Guestrin, Carlos; Liang, Percy; Hashimoto, Tatsunori B. (13 March 2023).

981:

concatenates relevant instructions to each new user message but zeros out the loss function for tokens in the prompt (earlier parts of the dialog).

1010: 1734: 1598: 1390: 2401: 966: 250:, a focus of research was up-scaling models which in some instances showed major increases in emergent capabilities. The release of 1548:"Save bandwidth by using a torrent to distribute more efficiently by ChristopherKing42 · Pull Request #73 · facebookresearch/llama" 281: 2453: 873:

foundational models were trained on a data set with 1.4 trillion tokens, drawn from publicly available data sources, including:

623:

Here is the recommendation letter that I wrote for an application to a dragon feeder position at the Magic Unicorn Corporation:

342:

Reactions to the leak varied. Some speculated that the model would be used for malicious purposes, such as more sophisticated

1241: 2313: 815: 1336: 1573:"Download weights from hugging face to help us save bandwidth by Jainam213 · Pull Request #109 · facebookresearch/llama" 1427: 932:

version of the LLaMA dataset. The dataset has approximately 1.2 trillion tokens and is publicly available for download.

135: 2154: 284:, architecture, and performance. The inference code used to run the model was publicly released under the open-source 1921: 1286: 1219: 1372: 122: 2384: 2366: 2197: 1522: 2216: 1647:

Touvron, Hugo; Martin, Louis; et al. (18 Jul 2023). "LLaMA-2: Open Foundation and Fine-Tuned Chat Models".

219:. Starting with Llama 2, Meta AI started releasing instruction fine-tuned versions alongside foundation models. 2516: 660: 319:

On March 3, 2023, a torrent containing LLaMA's weights was uploaded, with a link to the torrent shared on the

268:. It was observed that the Llama 3 models showed that when a model is trained on data that is more than the " 115: 1445: 2136: 970: 92: 1670:"Meta launches LLaMA-2, a source-available AI model that allows commercial applications [Updated]" 1354: 674: 374: 296: 269: 66: 2244: 1263: 1409:"ChatGPT is 'not particularly innovative,' and 'nothing revolutionary', says Meta's chief AI scientist" 387: 303:(with 175B parameters), and the largest 65B model was competitive with state of the art models such as 2402:

https://s21.q4cdn.com/399680738/files/doc_financials/2023/q4/META-Q4-2023-Earnings-Call-Transcript.pdf

2340: 2459: 2055: 1693: 1170: 994: 2521: 2511: 1753: 1014: 410: 928:

On April 17, 2023, TogetherAI launched a project named RedPajama to reproduce and distribute an

2506: 1897: 405:

human-annotated examples". Meta AI's testing showed in April 2024 that Llama 3 70B was beating

383: 1859: 1069:

describes the 8B parameter version of Llama 3 as being "surprisingly capable" given its size.

2452: 2008:

Lei Ba, Jimmy; Kiros, Jamie Ryan; Hinton, Geoffrey E. (2016-07-01). "Layer Normalization".

192: 130: 1017:. It shows increased performance on medical-related benchmarks such as MedQA and MedMCQA. 8: 1114: 990: 684: 668: 424:

Llama-3.1 was released on July 23, 2024, with three sizes: 8B, 70B, and 405B parameters.

414: 355: 413:

3 Sonnet on most benchmarks. Meta also announced plans to make Llama 3 multilingual and

2097: 2009: 1988: 1967: 1946: 1810: 1713: 1648: 1143: 998: 974: 904: 646: 265: 2422: 2029: 1572: 899: 327:

to the official documentation. On March 4, a pull request was opened to add links to

280:

LLaMA was announced on February 24, 2023, via a blog post and a paper describing the

227: 1882: 1547: 2117: 2082: 1066: 406: 351: 332: 254:

and its surprise success caused an increase in attention to large language models.

216: 199:

starting in February 2023. The latest version is Llama 3.1, released in July 2024.

150: 140: 1987:

Zhang, Biao; Sennrich, Rico (2019-10-01). "Root Mean Square Layer Normalization".

1772: 1005:

Meditron is a family of Llama-based finetuned on a corpus of clinical guidelines,

969:(RLHF). A major technical contribution is the departure from the exclusive use of 1446:"Yann LeCun on LinkedIn: My unwavering opinion on current (auto-regressive) LLMs" 1020: 2264:"You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi" 2198:"Meditron: An LLM suite for low-resource medical settings leveraging Meta Llama" 238:

in select regions, and a standalone website. Both services use a Llama 3 model.

2414: 1624: 1050: 921: 347: 223: 189: 1922:"llama-models/models/llama3_1/MODEL_CARD.md at main · meta-llama/llama-models" 1773:"Mark Zuckerberg - Llama 3, Open Sourcing $ 10b Models, & Caesar Augustus" 1171:"Introducing LLaMA: A foundational, 65-billion-parameter large language model" 2500: 2426: 1337:"Meta has a new machine learning language model to remind you it does AI too" 1309: 895: 308: 1694:"Introducing Code Llama, a state-of-the-art large language model for coding" 1493: 2030:"RedPajama-Data: An Open Source Recipe to Reproduce LLaMA training dataset" 1193: 962: 1735:"Meta releases Llama 3, claims it's among the best open models available" 1469:"Introducing Meta Llama 3: The most capable openly available LLM to date" 1468: 1242:"Meta's powerful AI language model has leaked online — what happens now?" 929: 878: 379: 328: 324: 2287: 2172: 2155:"Stanford Researchers Take Down Alpaca AI Over Cost and Hallucinations" 1391:"The inside story of how ChatGPT was built from the people who made it" 1089: 1042:

as open-source on March 10, 2023. It's a re-implementation of LLaMA in

258: 204: 98: 32: 417:, better at coding and reasoning, and to increase its context window. 2230: 1835: 1788:

do these meta reasoning questions of how do I want to spend our GPUs

1039: 1033: 889: 367: 261:

stated that large language models are best for aiding with writing.

257:

Compared with other responses to ChatGPT, Meta's Chief AI scientist

2102: 2014: 1993: 1972: 1951: 1815: 1718: 1653: 1428:"Meta's Yann LeCun on auto-regressive Large Language Models (LLMs)" 1148: 343: 235: 231: 1804: 2367:"Meta's Open Source Llama 3 Is Already Nipping at OpenAI's Heels" 680: 251: 196: 37: 1945:

Shazeer, Noam (2020-02-01). "GLU Variants Improve Transformer".

1625:"Large language models are having their Stable Diffusion moment" 421:

decision was made to end training to focus GPU power elsewhere.

1287:"Meta's AI research head wants open source licensing to change" 1220:"Meta heats up Big Tech's AI arms race with new language model" 1094: 1006: 884: 212: 2341:"Llamafile LLM driver project boosts performance on CPU cores" 1043: 954: 915: 911: 320: 300: 295:

Meta AI reported the 13B parameter model performance on most

285: 247: 20: 2451:

Huang, Kalley; O'Regan, Sylvia Varnham (September 5, 2023).

2415:"Meta's New Llama 3.1 AI Model Is Free, Powerful, and Risky" 1710: 1140: 2454:"Inside Meta's AI Drama: Internal Feuds Over Compute Power" 1310:"Meta and Microsoft Introduce the Next Generation of LLaMA" 1055: 336: 304: 211:

takedown requests against repositories sharing the link on

208: 2489: 2083:"Alpaca: A Strong, Replicable Instruction-Following Model" 331:

repositories containing the model. On March 6, Meta filed

164: 2080: 1754:"Meta debuts third-generation Llama large language model" 618:"recommendation letter for the Magic Unicorn Corporation" 1264:"github/dmca - Notice of Claimed Infringement via Email" 1194:"Introducing Llama 3.1: Our most capable models to date" 659:

Like GPT-3, the Llama series of models are decoder-only

2217:"EPFL's new Large Language Model for Medical Knowledge" 1802: 1599:"Facebook's Powerful Large Language Model Leaks Online" 1373:"Examining Emergent Abilities in Large Language Models" 1013:

School of Computer and Communication Sciences, and the

1009:

papers, and articles. It was created by researchers at

264:

An empirical investigation of the Llama series was the

2137:"Stanford takes costly, risky Alpaca AI model offline" 1355:"Meet Your New Assistant: Meta AI, Built With Llama 3" 849:{\displaystyle \operatorname {RoPE} (\theta =500,000)} 1836:"The Falcon has landed in the Hugging Face ecosystem" 818: 2385:"Meta's amped-up AI agents confusing Facebook users" 2085:. Stanford Center for Research on Foundation Models. 1523:"Meta's LLaMA Leaked to the Public, Thanks To 4chan" 645:– Output of 65 billion parameter LLaMA model before 2048: 246:After the release of large language models such as 2095: 1898:"llama3/MODEL_CARD.md at main · meta-llama/llama3" 848: 2332: 2007: 1885:The model card has some more interesting info too 2498: 2450: 1860:"llama/MODEL_CARD.md at main · meta-llama/llama" 1335:Peters, Jay; Vincent, James (24 February 2023). 1261: 677:(RoPE) instead of absolute positional embedding; 2314:"Quantize Llama models with GGUF and llama.cpp" 1965: 1218:Malik, Yuvraj; Paul, Katie (25 February 2023). 914:source code for scientific papers uploaded to 373:Llama 2 includes foundation models and models 1986: 1646: 1516: 1514: 1463: 1461: 1459: 1334: 1188: 1186: 1184: 883:Open source repositories of source code from 1304: 1302: 1300: 1257: 1255: 1235: 1233: 1231: 1229: 1115:"llama3/LICENSE at main · meta-llama/llama3" 1038:Software developer Georgi Gerganov released 612: 299:benchmarks exceeded that of the much larger 2165: 2022: 1642: 1640: 1638: 1511: 1456: 1181: 973:(PPO) for RLHF – a new technique based on 967:Reinforcement learning from human feedback 2101: 2013: 1992: 1971: 1950: 1814: 1717: 1652: 1425: 1297: 1252: 1226: 1217: 1147: 1136: 1134: 1132: 1130: 1128: 2214: 1622: 1616: 1011:École Polytechnique Fédérale de Lausanne 663:, but there are some minor differences: 2311: 2261: 1944: 1732: 1667: 1635: 1239: 427: 2499: 2467:from the original on September 5, 2023 1165: 1163: 1161: 1159: 1125: 995:Human-Centered Artificial Intelligence 366:On July 18, 2023, in partnership with 2338: 1798: 1796: 1770: 1540: 1284: 2312:Labonne, Maxime (29 November 2023). 2245:"How Companies Are Using Meta Llama" 2215:Petersen, Tanya (28 November 2023). 1751: 1486: 861: 1596: 1426:Badminton, Nik (13 February 2023). 1156: 13: 2443: 2412: 2364: 1828: 1793: 1520: 1262:OpSec Online LLC (21 March 2023). 275: 222:Alongside the release of Llama 3, 14: 2533: 2481: 2134: 1623:Willison, Simon (11 March 2023). 1285:David, Emilia (30 October 2023). 692:key hyperparameters of Llama 3.1 72:Llama 3.1 / July 23, 2024 1883:Andrej Karpathy (Apr 18, 2024), 1058:and 8-bit quantized data types. 2406: 2395: 2377: 2358: 2305: 2280: 2255: 2237: 2223: 2208: 2190: 2147: 2128: 2110: 2089: 2074: 2001: 1980: 1959: 1938: 1914: 1890: 1876: 1852: 1764: 1752:Mann, Tobias (April 19, 2024). 1745: 1733:Wiggers, Kyle (18 April 2024). 1726: 1704: 1686: 1661: 1590: 1565: 1438: 1419: 1401: 1383: 1365: 1347: 1328: 1240:Vincent, James (8 March 2023). 984: 654: 16:Large language model by Meta AI 1771:Patel, Dwarkesh (2024-07-24). 1278: 1211: 1107: 948: 843: 825: 207:. In response, Meta AI issued 155:Meta Llama 3 Community License 1: 1100: 241: 2262:Edwards, Benj (2023-03-13). 1668:Edwards, Benj (2023-07-18). 1597:Cox, Joseph (7 March 2023). 1521:VK, Anirudh (6 March 2023). 1061: 1027: 971:Proximal Policy Optimization 675:rotary positional embeddings 649:, given the prompt (in bold) 448:Training cost (petaFLOP-day) 182:Large Language Model Meta AI 7: 1083: 977:was used, followed by PPO. 920:Questions and answers from 386:(known for maintaining the 10: 2538: 1806:The Llama 3 Herd of Models 1031: 399: 361: 18: 812: 804: 796: 613:Architecture and training 578: 525: 522: 519: 382:has been disputed by the 159: 149: 121: 111: 91: 87: 65: 61: 43: 31: 1527:Analytics India Magazine 19:Not to be confused with 1629:Simon Willison's Weblog 1015:Yale School of Medicine 314: 184:, formerly stylized as 49:; 18 months ago 2320:. Towards Data Science 2233:. epfLLM. 11 May 2024. 850: 810:Positional Embeddings 642: 632:she feeds the dragons. 457:Commercial viability? 389:Open Source Definition 384:Open Source Initiative 47:February 24, 2023 2517:Large language models 1777:www.dwarkeshpatel.com 1395:MIT Technology Review 1049:llamafile created by 851: 620: 193:large language models 74:; 60 days ago 2339:Connatser, Matthew. 1379:. 13 September 2022. 877:Webpages scraped by 816: 794:Activation Function 428:Comparison of models 350:, compared LLaMA to 131:Large language model 2345:www.theregister.com 2141:www.theregister.com 2056:"RedPajama-Data-1T" 1177:. 24 February 2023. 991:Stanford University 780:Peak Learning Rate 693: 685:layer normalization 669:activation function 356:text-to-image model 195:(LLMs) released by 28: 999:instruction tuning 975:Rejection sampling 846: 691: 647:instruction tuning 465:February 24, 2023 26: 2231:"epfLLM/meditron" 2135:Quach, Katyanna. 2122:crfm.stanford.edu 900:Project Gutenberg 862:Training datasets 859: 858: 610: 609: 333:takedown requests 228:virtual assistant 188:) is a family of 175: 174: 2529: 2493: 2492: 2490:Official website 2476: 2474: 2472: 2456: 2437: 2436: 2434: 2433: 2410: 2404: 2399: 2393: 2392: 2391:. 19 April 2024. 2381: 2375: 2374: 2362: 2356: 2355: 2353: 2351: 2336: 2330: 2329: 2327: 2325: 2309: 2303: 2302: 2300: 2298: 2284: 2278: 2277: 2275: 2274: 2259: 2253: 2252: 2241: 2235: 2234: 2227: 2221: 2220: 2212: 2206: 2205: 2194: 2188: 2187: 2185: 2183: 2169: 2163: 2162: 2161:. 21 March 2023. 2151: 2145: 2144: 2132: 2126: 2125: 2114: 2108: 2107: 2105: 2093: 2087: 2086: 2078: 2072: 2071: 2069: 2067: 2052: 2046: 2045: 2043: 2041: 2026: 2020: 2019: 2017: 2005: 1999: 1998: 1996: 1984: 1978: 1977: 1975: 1963: 1957: 1956: 1954: 1942: 1936: 1935: 1933: 1932: 1918: 1912: 1911: 1909: 1908: 1894: 1888: 1880: 1874: 1873: 1871: 1870: 1856: 1850: 1849: 1847: 1846: 1832: 1826: 1825: 1824: 1823: 1818: 1800: 1791: 1790: 1784: 1783: 1768: 1762: 1761: 1749: 1743: 1742: 1730: 1724: 1723: 1721: 1708: 1702: 1701: 1690: 1684: 1683: 1681: 1680: 1665: 1659: 1658: 1656: 1644: 1633: 1632: 1620: 1614: 1613: 1611: 1609: 1594: 1588: 1587: 1585: 1583: 1569: 1563: 1562: 1560: 1558: 1544: 1538: 1537: 1535: 1533: 1518: 1509: 1508: 1506: 1504: 1490: 1484: 1483: 1481: 1480: 1475:. April 18, 2024 1465: 1454: 1453: 1450:www.linkedin.com 1442: 1436: 1435: 1423: 1417: 1416: 1405: 1399: 1398: 1387: 1381: 1380: 1377:hai.stanford.edu 1369: 1363: 1362: 1361:. 18 April 2024. 1351: 1345: 1344: 1332: 1326: 1325: 1323: 1321: 1306: 1295: 1294: 1282: 1276: 1275: 1273: 1271: 1259: 1250: 1249: 1237: 1224: 1223: 1215: 1209: 1208: 1206: 1205: 1190: 1179: 1178: 1167: 1154: 1153: 1151: 1138: 1123: 1122: 1111: 855: 853: 852: 847: 802:Vocabulary Size 766:Key/Value Heads 752:Attention Heads 724:Model Dimension 694: 690: 671:instead of GeLU; 650: 638:Honorable Knight 534:August 24, 2023 436: 435: 352:Stable Diffusion 282:model's training 217:foundation model 171: 168: 166: 141:Foundation model 107: 104: 102: 100: 82: 80: 75: 57: 55: 50: 29: 25: 2537: 2536: 2532: 2531: 2530: 2528: 2527: 2526: 2497: 2496: 2488: 2487: 2484: 2479: 2470: 2468: 2460:The Information 2446: 2444:Further reading 2441: 2440: 2431: 2429: 2411: 2407: 2400: 2396: 2383: 2382: 2378: 2363: 2359: 2349: 2347: 2337: 2333: 2323: 2321: 2310: 2306: 2296: 2294: 2286: 2285: 2281: 2272: 2270: 2260: 2256: 2243: 2242: 2238: 2229: 2228: 2224: 2213: 2209: 2196: 2195: 2191: 2181: 2179: 2171: 2170: 2166: 2153: 2152: 2148: 2133: 2129: 2118:"Stanford CRFM" 2116: 2115: 2111: 2094: 2090: 2079: 2075: 2065: 2063: 2054: 2053: 2049: 2039: 2037: 2028: 2027: 2023: 2006: 2002: 1985: 1981: 1964: 1960: 1943: 1939: 1930: 1928: 1920: 1919: 1915: 1906: 1904: 1896: 1895: 1891: 1881: 1877: 1868: 1866: 1858: 1857: 1853: 1844: 1842: 1834: 1833: 1829: 1821: 1819: 1801: 1794: 1781: 1779: 1769: 1765: 1750: 1746: 1731: 1727: 1709: 1705: 1692: 1691: 1687: 1678: 1676: 1666: 1662: 1645: 1636: 1621: 1617: 1607: 1605: 1595: 1591: 1581: 1579: 1571: 1570: 1566: 1556: 1554: 1546: 1545: 1541: 1531: 1529: 1519: 1512: 1502: 1500: 1492: 1491: 1487: 1478: 1476: 1467: 1466: 1457: 1444: 1443: 1439: 1424: 1420: 1407: 1406: 1402: 1389: 1388: 1384: 1371: 1370: 1366: 1353: 1352: 1348: 1333: 1329: 1319: 1317: 1308: 1307: 1298: 1283: 1279: 1269: 1267: 1260: 1253: 1238: 1227: 1216: 1212: 1203: 1201: 1200:. July 23, 2024 1192: 1191: 1182: 1169: 1168: 1157: 1139: 1126: 1113: 1112: 1108: 1103: 1086: 1064: 1036: 1030: 987: 951: 892:in 20 languages 864: 817: 814: 813: 657: 652: 644: 639: 637: 635: 633: 630: 627: 625:Dear recruiter, 624: 619: 615: 560:April 18, 2024 430: 402: 364: 317: 278: 276:Initial release 244: 163: 145: 97: 83: 78: 76: 73: 53: 51: 48: 44:Initial release 24: 17: 12: 11: 5: 2535: 2525: 2524: 2522:Meta Platforms 2519: 2514: 2512:Internet leaks 2509: 2495: 2494: 2483: 2482:External links 2480: 2478: 2477: 2447: 2445: 2442: 2439: 2438: 2413:Knight, Will. 2405: 2394: 2376: 2365:Knight, Will. 2357: 2331: 2304: 2292:huggingface.co 2279: 2254: 2236: 2222: 2207: 2189: 2164: 2146: 2127: 2109: 2088: 2073: 2047: 2021: 2000: 1979: 1958: 1937: 1913: 1889: 1875: 1851: 1840:huggingface.co 1827: 1792: 1763: 1744: 1725: 1703: 1685: 1660: 1634: 1615: 1589: 1564: 1539: 1510: 1485: 1455: 1437: 1418: 1400: 1382: 1364: 1346: 1327: 1316:. 18 July 2023 1296: 1277: 1251: 1225: 1210: 1180: 1155: 1124: 1105: 1104: 1102: 1099: 1098: 1097: 1092: 1085: 1082: 1063: 1060: 1051:Justine Tunney 1032:Main article: 1029: 1026: 993:Institute for 986: 983: 950: 947: 926: 925: 922:Stack Exchange 918: 908: 902: 893: 887: 881: 863: 860: 857: 856: 845: 842: 839: 836: 833: 830: 827: 824: 821: 811: 807: 806: 803: 799: 798: 795: 791: 790: 787: 784: 781: 777: 776: 773: 770: 767: 763: 762: 759: 756: 753: 749: 748: 745: 742: 739: 738:FFN Dimension 735: 734: 731: 728: 725: 721: 720: 717: 714: 711: 707: 706: 703: 700: 697: 689: 688: 678: 672: 656: 653: 617: 616: 614: 611: 608: 607: 604: 601: 600: 599: 596: 593: 588: 587:July 23, 2024 585: 581: 580: 577: 574: 571: 570: 569: 566: 561: 558: 554: 553: 551: 550: 549: 546: 543: 540: 535: 532: 528: 527: 524: 521: 518: 515: 514: 513: 510: 507: 502: 501:July 18, 2023 499: 495: 494: 491: 488: 485: 482: 481: 480: 477: 474: 471: 466: 463: 459: 458: 455: 452: 451:Context length 449: 446: 443: 440: 429: 426: 401: 398: 363: 360: 348:Simon Willison 316: 313: 277: 274: 243: 240: 190:autoregressive 173: 172: 161: 157: 156: 153: 147: 146: 144: 143: 138: 133: 127: 125: 119: 118: 113: 109: 108: 95: 89: 88: 85: 84: 71: 69: 67:Stable release 63: 62: 59: 58: 45: 41: 40: 35: 15: 9: 6: 4: 3: 2: 2534: 2523: 2520: 2518: 2515: 2513: 2510: 2508: 2507:2023 software 2505: 2504: 2502: 2491: 2486: 2485: 2466: 2462: 2461: 2455: 2449: 2448: 2428: 2424: 2420: 2416: 2409: 2403: 2398: 2390: 2386: 2380: 2372: 2368: 2361: 2346: 2342: 2335: 2319: 2315: 2308: 2293: 2289: 2283: 2269: 2265: 2258: 2251:. 7 May 2024. 2250: 2246: 2240: 2232: 2226: 2218: 2211: 2203: 2199: 2193: 2178: 2174: 2173:"alpaca-lora" 2168: 2160: 2156: 2150: 2142: 2138: 2131: 2123: 2119: 2113: 2104: 2099: 2092: 2084: 2077: 2061: 2057: 2051: 2035: 2031: 2025: 2016: 2011: 2004: 1995: 1990: 1983: 1974: 1969: 1962: 1953: 1948: 1941: 1927: 1923: 1917: 1903: 1899: 1893: 1887: 1886: 1879: 1865: 1861: 1855: 1841: 1837: 1831: 1817: 1812: 1808: 1807: 1799: 1797: 1789: 1778: 1774: 1767: 1759: 1755: 1748: 1740: 1736: 1729: 1720: 1715: 1707: 1699: 1695: 1689: 1675: 1671: 1664: 1655: 1650: 1643: 1641: 1639: 1630: 1626: 1619: 1604: 1600: 1593: 1578: 1574: 1568: 1553: 1549: 1543: 1528: 1524: 1517: 1515: 1499: 1495: 1489: 1474: 1470: 1464: 1462: 1460: 1451: 1447: 1441: 1433: 1429: 1422: 1414: 1410: 1404: 1396: 1392: 1386: 1378: 1374: 1368: 1360: 1356: 1350: 1342: 1338: 1331: 1315: 1311: 1305: 1303: 1301: 1292: 1288: 1281: 1265: 1258: 1256: 1247: 1243: 1236: 1234: 1232: 1230: 1221: 1214: 1199: 1195: 1189: 1187: 1185: 1176: 1172: 1166: 1164: 1162: 1160: 1150: 1145: 1137: 1135: 1133: 1131: 1129: 1120: 1116: 1110: 1106: 1096: 1093: 1091: 1088: 1087: 1081: 1077: 1073: 1070: 1068: 1059: 1057: 1052: 1047: 1045: 1041: 1035: 1025: 1022: 1018: 1016: 1012: 1008: 1003: 1000: 996: 992: 982: 978: 976: 972: 968: 964: 959: 956: 946: 944: 940: 937: 933: 931: 923: 919: 917: 913: 909: 907:books dataset 906: 903: 901: 897: 896:Public domain 894: 891: 888: 886: 882: 880: 876: 875: 874: 872: 868: 840: 837: 834: 831: 828: 822: 819: 809: 808: 801: 800: 793: 792: 788: 785: 782: 779: 778: 774: 771: 768: 765: 764: 760: 757: 754: 751: 750: 746: 743: 740: 737: 736: 732: 729: 726: 723: 722: 718: 715: 712: 709: 708: 704: 701: 698: 696: 695: 686: 682: 679: 676: 673: 670: 666: 665: 664: 662: 651: 648: 641: 636:Best regards, 626: 605: 602: 597: 594: 591: 590: 589: 586: 583: 582: 575: 572: 567: 564: 563: 562: 559: 556: 555: 552: 547: 544: 541: 538: 537: 536: 533: 530: 529: 516: 511: 508: 505: 504: 503: 500: 497: 496: 492: 489: 486: 483: 478: 475: 472: 469: 468: 467: 464: 461: 460: 456: 453: 450: 447: 444: 441: 438: 437: 434: 425: 422: 418: 416: 412: 408: 397: 393: 391: 390: 385: 381: 376: 371: 369: 359: 357: 353: 349: 345: 340: 338: 334: 330: 326: 322: 312: 310: 306: 302: 298: 293: 290: 287: 283: 273: 271: 267: 262: 260: 255: 253: 249: 239: 237: 233: 229: 225: 220: 218: 214: 210: 206: 200: 198: 194: 191: 187: 183: 179: 170: 162: 158: 154: 152: 148: 142: 139: 137: 134: 132: 129: 128: 126: 124: 120: 117: 114: 110: 106: 96: 94: 90: 86: 70: 68: 64: 60: 46: 42: 39: 36: 34: 30: 22: 2471:September 6, 2469:. Retrieved 2458: 2430:. Retrieved 2418: 2408: 2397: 2388: 2379: 2370: 2360: 2348:. Retrieved 2344: 2334: 2322:. Retrieved 2317: 2307: 2295:. Retrieved 2291: 2282: 2271:. Retrieved 2268:Ars Technica 2267: 2257: 2248: 2239: 2225: 2210: 2201: 2192: 2180:. Retrieved 2176: 2167: 2158: 2149: 2140: 2130: 2121: 2112: 2091: 2076: 2064:. Retrieved 2060:Hugging Face 2059: 2050: 2038:. Retrieved 2033: 2024: 2003: 1982: 1961: 1940: 1929:. Retrieved 1925: 1916: 1905:. Retrieved 1901: 1892: 1884: 1878: 1867:. Retrieved 1863: 1854: 1843:. Retrieved 1839: 1830: 1820:, retrieved 1805: 1786: 1780:. Retrieved 1776: 1766: 1758:The Register 1757: 1747: 1738: 1728: 1706: 1697: 1688: 1677:. Retrieved 1674:Ars Technica 1673: 1663: 1628: 1618: 1606:. Retrieved 1602: 1592: 1580:. Retrieved 1576: 1567: 1555:. Retrieved 1551: 1542: 1530:. Retrieved 1526: 1501:. Retrieved 1497: 1488: 1477:. Retrieved 1472: 1449: 1440: 1432:Futurist.com 1431: 1421: 1412: 1403: 1394: 1385: 1376: 1367: 1358: 1349: 1340: 1330: 1318:. Retrieved 1313: 1290: 1280: 1268:. Retrieved 1245: 1213: 1202:. Retrieved 1197: 1174: 1118: 1109: 1078: 1074: 1071: 1065: 1048: 1037: 1019: 1004: 988: 985:Applications 979: 963:AI alignment 960: 952: 942: 941: 935: 934: 927: 870: 869: 865: 661:Transformers 658: 655:Architecture 643: 622: 621: 442:Release date 431: 423: 419: 409:pro 1.5 and 403: 394: 388: 372: 365: 341: 318: 294: 291: 279: 266:scaling laws 263: 256: 245: 230:features to 221: 201: 185: 181: 177: 176: 33:Developer(s) 2202:ai.meta.com 1698:ai.meta.com 1473:ai.meta.com 1198:ai.meta.com 949:Fine-tuning 930:open source 898:books from 879:CommonCrawl 683:instead of 640:Sir George 531:Code Llama 454:Corpus size 445:Parameters 380:open source 329:HuggingFace 325:magnet link 103:/meta-llama 2501:Categories 2432:2024-08-04 2273:2024-01-04 2103:2212.10560 2062:. Together 2036:. Together 2015:1607.06450 1994:1910.07467 1973:2104.09864 1952:2002.05202 1931:2024-07-23 1907:2024-05-28 1869:2024-05-28 1845:2023-06-20 1822:2024-08-08 1816:2407.21783 1782:2024-08-01 1739:TechCrunch 1719:2308.12950 1679:2023-08-08 1654:2307.09288 1479:2024-04-21 1222:. Reuters. 1204:2024-07-23 1149:2302.13971 1101:References 1090:Mistral AI 584:Llama 3.1 415:multimodal 375:fine-tuned 309:Chinchilla 270:Chinchilla 259:Yann LeCun 242:Background 205:BitTorrent 112:Written in 93:Repository 79:2024-07-23 54:2023-02-24 2427:1059-1028 1341:The Verge 1291:The Verge 1246:The Verge 1142:Models". 1062:Reception 1040:llama.cpp 1034:llama.cpp 1028:llama.cpp 890:Knowledge 867:process. 829:θ 823:⁡ 789:0.8 × 10 786:1.5 × 10 368:Microsoft 2465:Archived 2389:ABC News 1608:17 March 1582:17 March 1557:25 March 1532:17 March 1503:16 March 1270:25 March 1266:. GitHub 1084:See also 924:websites 805:128,000 606:128,000 603:440,000 573:100,000 557:Llama 3 498:Llama 2 236:WhatsApp 232:Facebook 2182:5 April 2159:Gizmodo 1494:"llama" 1320:21 July 1175:Meta AI 943:Llama 3 936:Llama 2 871:LLaMA 1 797:SwiGLU 783:3 × 10 747:53,248 744:28,672 741:14,336 733:16,384 710:Layers 681:RMSNorm 667:SwiGLU 629:feeder. 517:21,000 490:1–1.4T 400:Llama 3 362:Llama 2 252:ChatGPT 197:Meta AI 160:Website 151:License 105:/llama3 77: ( 52: ( 38:Meta AI 2425: 2350:10 May 2318:Medium 2288:"GGUF" 2177:GitHub 2034:GitHub 1926:GitHub 1902:GitHub 1864:GitHub 1577:GitHub 1552:GitHub 1498:GitHub 1119:GitHub 1095:GPT-4o 1007:PubMed 905:Books3 885:GitHub 727:4,096 484:6,300 462:LLaMA 411:Claude 407:Gemini 226:added 213:GitHub 116:Python 99:github 2419:Wired 2371:Wired 2324:9 May 2297:9 May 2098:arXiv 2066:4 May 2040:4 May 2010:arXiv 1989:arXiv 1968:arXiv 1947:arXiv 1811:arXiv 1714:arXiv 1649:arXiv 1413:ZDNET 1144:arXiv 1067:Wired 955:GPT-4 916:ArXiv 912:LaTeX 730:8192 705:405B 595:70.6B 576:8192 568:70.6B 545:33.7B 520:4096 487:2048 479:65.2B 476:32.5B 321:4chan 301:GPT-3 286:GPLv3 248:GPT-3 186:LLaMA 178:Llama 167:.meta 165:llama 27:Llama 21:LaMDA 2473:2023 2423:ISSN 2352:2024 2326:2024 2299:2024 2249:Meta 2184:2023 2068:2023 2042:2023 1610:2023 1603:Vice 1584:2023 1559:2023 1534:2023 1505:2023 1359:Meta 1322:2023 1314:Meta 1272:2023 1056:FP16 1021:Zoom 989:The 961:For 910:The 820:RoPE 761:128 719:126 702:70B 598:405B 579:15T 539:6.7B 526:Yes 506:6.7B 470:6.7B 439:Name 354:, a 344:spam 337:DMCA 315:Leak 307:and 305:PaLM 234:and 224:Meta 209:DMCA 169:.com 123:Type 101:.com 1044:C++ 841:000 835:500 758:64 755:32 716:80 713:32 699:8B 548:69B 542:13B 523:2T 512:69B 509:13B 493:No 473:13B 392:). 297:NLP 136:GPT 2503:: 2463:. 2457:. 2421:. 2417:. 2387:. 2369:. 2343:. 2316:. 2290:. 2266:. 2247:. 2200:. 2175:. 2157:. 2139:. 2120:. 2058:. 2032:. 1924:. 1900:. 1862:. 1838:. 1809:, 1795:^ 1785:. 1775:. 1756:. 1737:. 1696:. 1672:. 1637:^ 1627:. 1601:. 1575:. 1550:. 1525:. 1513:^ 1496:. 1471:. 1458:^ 1448:. 1430:. 1411:. 1393:. 1375:. 1357:. 1339:. 1312:. 1299:^ 1289:. 1254:^ 1244:. 1228:^ 1196:. 1183:^ 1173:. 1158:^ 1127:^ 1117:. 775:8 772:8 769:8 592:8B 565:8B 311:. 2475:. 2435:. 2373:. 2354:. 2328:. 2301:. 2276:. 2219:. 2204:. 2186:. 2143:. 2124:. 2106:. 2100:: 2070:. 2044:. 2018:. 2012:: 1997:. 1991:: 1976:. 1970:: 1955:. 1949:: 1934:. 1910:. 1872:. 1848:. 1813:: 1760:. 1741:. 1722:. 1716:: 1700:. 1682:. 1657:. 1651:: 1631:. 1612:. 1586:. 1561:. 1536:. 1507:. 1482:. 1452:. 1434:. 1415:. 1397:. 1343:. 1324:. 1293:. 1274:. 1248:. 1207:. 1152:. 1146:: 1121:. 844:) 838:, 832:= 826:( 687:; 180:( 81:) 56:) 23:.

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Index