Text-to-image model

233: 224: 213: 204: 20: 291: 335: 2104: 2084: 162:) and were considered to be 'low in diversity'. The model was able to generalize to objects not represented in the training data (such as a red school bus) and appropriately handled novel prompts such as "a stop sign is flying in blue skies", exhibiting output that it was not merely "memorizing" data from the 318:

also becoming a popular option in recent years. Rather than directly training a model to output a high-resolution image conditioned on a text embedding, a popular technique is to train a model to generate low-resolution images, and use one or more auxiliary deep learning models to upscale it, filling

351:

Evaluating and comparing the quality of text-to-image models is a problem involving assessing multiple desirable properties. A desideratum specific to text-to-image models is that generated images semantically align with the text captions used to generate them. A number of schemes have been devised

1076:

Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Kamyar Seyed Ghasemipour, Seyed; Karagol Ayan, Burcu; Sara Mahdavi, S.; Gontijo Lopes, Rapha; Salimans, Tim; Ho, Jonathan; J Fleet, David; Norouzi, Mohammad (23 May 2022). "Photorealistic Text-to-Image Diffusion

343:

human annotators. Oxford-120 Flowers and CUB-200 Birds are smaller datasets of around 10,000 images each, restricted to flowers and birds, respectively. It is considered less difficult to train a high-quality text-to-image model with these datasets because of their narrow range of subject matter.

342:

Training a text-to-image model requires a dataset of images paired with text captions. One dataset commonly used for this purpose is the COCO dataset. Released by Microsoft in 2014, COCO consists of around 123,000 images depicting a diversity of objects with five captions per image, generated by

366:

when applied to a sample of images generated by the text-to-image model. The score is increased when the image classification model predicts a single label with high probability, a scheme intended to favour "distinct" generated images. Another popular metric is the related

399:) by amateurs, novel entertainment, fast prototyping, increasing art-making accessibility, and artistic output per effort and/or expenses and/or time—e.g., via generating drafts, draft-refinitions, and image components ( 195:(Common Objects in Context) dataset produced images which were "from a distance... encouraging", but which lacked coherence in their details. Later systems include VQGAN-CLIP, XMC-GAN, and GauGAN2. 173: 322:

Text-to-image models are trained on large datasets of (text, image) pairs, often scraped from the web. With their 2022 Imagen model, Google Brain reported positive results from using a

267:

allows to teach the model a new concept using a small set of images of a new object that was not included in the training set of the text-to-image foundation model. This is achieved by

187:

for the text-to-image task. With models trained on narrow, domain-specific datasets, they were able to generate "visually plausible" images of birds and flowers from text captions like

371:, which compares the distribution of generated images and real training images according to features extracted by one of the final layers of a pretrained image classification model. 407:-stage ideas. Additional functionalities or improvements may also relate to post-generation manual editing (i.e., polishing), such as subsequent tweaking with an image editor. 955: 1978: 1054: 241:

DALL·E 2's (top, April 2022) and DALL·E 3's (bottom, September 2023) generated images for the prompt "A stop sign is flying in blue skies"

105:, which produces an image conditioned on that representation. The most effective models have generally been trained on massive amounts of image and text data 980: 259:

system announced in January 2021. A successor capable of generating more complex and realistic images, DALL-E 2, was unveiled in April 2022, followed by

1226: 912: 1820: 893: 751:

Mansimov, Elman; Parisotto, Emilio; Ba, Jimmy Lei; Salakhutdinov, Ruslan (February 29, 2016). "Generating Images from Captions with Attention".

1184: 294:

High-level architecture showing the state of AI art machine learning models, and notable models and applications as a clickable SVG image map

689:

Mansimov, Elman; Parisotto, Emilio; Lei Ba, Jimmy; Salakhutdinov, Ruslan (November 2015). "Generating Images from Captions with Attention".

2150: 1104: 282:

platforms such as Runway, Make-A-Video, Imagen Video, Midjourney, and Phenaki can generate video from text and/or text/image prompts.

326:

trained separately on a text-only corpus (with its weights subsequently frozen), a departure from the theretofore standard approach.

1159: 608: 1336: 381: 178:

Eight images generated from the text prompt "A stop sign is flying in blue skies." by AlignDRAW (2015). Enlarged to show detail.

1219: 2539: 2512: 2555: 2159: 2009: 307: 256: 136:, was more tractable, and a number of image captioning deep learning models came prior to the first text-to-image models. 2110: 1661: 1398: 2279: 298:

Text-to-image models have been built using a variety of architectures. The text encoding step may be performed with a

2576: 1922: 1549: 1356: 1212: 1130: 557: 1877: 2143: 232: 223: 212: 203: 338:

Examples of images and captions from three public datasets which are commonly used to train text-to-image models

2402: 2203: 2064: 2004: 1602: 473: 311: 184: 2527: 2517: 2407: 2226: 1597: 1286: 1029: 268: 264: 368: 2522: 2387: 2327: 2039: 1436: 1393: 1346: 1341: 403:). Generated images are sometimes used as sketches, low-cost experiments, inspiration, or illustrations of 2090: 1386: 1312: 587: 133: 2315: 2136: 1714: 1649: 1250: 956:"Meta AI Introduces 'Make-A-Video': An Artificial Intelligence System That Generates Videos From Text" 2115: 1973: 1612: 1443: 1266: 19: 2014: 1271: 299: 144: 2059: 2044: 1697: 1692: 1592: 1460: 1241: 388: 147: 139:

The first modern text-to-image model, alignDRAW, was introduced in 2015 by researchers from the

2534: 2484: 2449: 2427: 2422: 2019: 1779: 1498: 1493: 508: 486: 303: 90: 40: 783:

Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, Jörn; Dengel, Andreas (December 2021).

658:

Zhu, Xiaojin; Goldberg, Andrew B.; Eldawy, Mohamed; Dyer, Charles R.; Strock, Bradley (2007).

2377: 2356: 2300: 2049: 2034: 1999: 1687: 1587: 1455: 719:

Reed, Scott; Akata, Zeynep; Logeswaran, Lajanugen; Schiele, Bernt; Lee, Honglak (June 2016).

392: 140: 98: 67: 1917: 930: 2260: 2246: 2190: 2069: 2024: 1470: 1415: 1261: 1256: 323: 310:

models have since become a more popular option. For the image generation step, conditional

290: 8: 2392: 2320: 1644: 1371: 1366: 1324: 1276: 1160:"Why Silicon Valley is so excited about awkward drawings done by artificial intelligence" 151: 50:

Text-to-image models began to be developed in the mid-2010s during the beginnings of the

2454: 2442: 2289: 2284: 2236: 2029: 1607: 1078: 824: 796: 756: 731: 694: 659: 639: 913:"OpenAI's new DALL-E model draws anything — but bigger, better and faster than before" 2504: 2397: 2382: 2198: 2095: 2083: 1887: 1539: 1410: 1403: 828: 816: 2489: 2479: 2361: 2349: 1840: 1830: 1637: 1431: 1381: 1376: 1319: 1307: 806: 546: 404: 352:

for assessing these qualities, some automated and others based on human judgement.

334: 260: 155: 154:) to be conditioned on text sequences. Images generated by alignDRAW were in small 102: 71: 44: 28: 391:, which may include enabling the expansion of noncommercial niche genres (such as 2310: 2175: 2167: 1953: 1897: 1719: 1361: 1281: 811: 784: 363: 356: 315: 247:

One of the first text-to-image models to capture widespread public attention was

636:

A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis

58:. In 2022, the output of state-of-the-art text-to-image models—such as OpenAI's 2459: 1927: 1892: 1882: 1707: 1465: 1291: 634:

Agnese, Jorge; Herrera, Jonathan; Tao, Haicheng; Zhu, Xingquan (October 2019),

275: 192: 172: 143:. alignDRAW extended the previously-introduced DRAW architecture (which used a 94: 894:"OpenAI's DALL-E creates plausible images of literally anything you ask it to" 720: 2570: 2464: 2338: 2268: 1872: 1852: 1769: 1448: 1005: 868: 843: 518: 359:(IS), which is based on the distribution of labels predicted by a pretrained 355:

A common algorithmic metric for assessing image quality and diversity is the

279: 159: 118: 55: 1098: 1096: 1094: 2469: 2432: 2417: 2412: 2332: 2273: 1958: 1789: 1204: 820: 106: 63: 1153: 1151: 2437: 2305: 2241: 2054: 1825: 1734: 1729: 1351: 1329: 1091: 360: 1122: 1948: 1907: 1902: 1815: 1724: 1632: 1544: 1524: 1148: 533: 526: 400: 79: 75: 1055:"Runway teases AI-powered text-to-video editing using written prompts" 1943: 1912: 1810: 1654: 1617: 1554: 1508: 1503: 1488: 609:"All these images were generated by Google's latest text-to-image AI" 396: 24: 2128: 688: 23:

An image conditioned on the prompt "an astronaut riding a horse, by

2295: 1845: 1677: 1083: 801: 761: 736: 699: 644: 564: 271:, namely, finding a new text term that correspond to these images. 163: 126: 125:

by arranging existing component images, such as from a database of

83: 59: 2344: 1968: 1805: 1759: 1682: 1582: 1577: 1529: 981:"Google's newest AI generator creates HD video from text prompts" 660:"A text-to-picture synthesis system for augmenting communication" 122: 51: 750: 2474: 2231: 1983: 1963: 1835: 1627: 493: 463: 453: 443: 435: 252: 248: 47:

description and produces an image matching that description.

2494: 2180: 1784: 1764: 1754: 1749: 1744: 1739: 1702: 1534: 2218: 1774: 263:

that was publicly released in August 2022. In August 2022,

1131:"A.I.-Generated Art Is Already Transforming Creative Work" 718: 183:

In 2016, Reed, Akata, Yan et al. became the first to use

121:, attempts to build text-to-image models were limited to 1105:"How 'synthetic media' will transform business forever" 411: 189:"an all black bird with a distinct thick, rounded bill" 657: 382:

Artificial intelligence art § Impact and applications

753:

International Conference on Learning Representations

31:, a large-scale text-to-image model released in 2022 1075: 633: 78:—began to be considered to approach the quality of 1185:"Imagen 2 on Vertex AI is now generally available" 869:"🎆🌆 Edge#231: Text-to-Image Synthesis with GANs" 782: 2568: 721:"Generative Adversarial Text to Image Synthesis" 785:"Adversarial text-to-image synthesis: A review" 684: 682: 680: 2144: 1220: 1234: 1069: 728:International Conference on Machine Learning 677: 285: 714: 712: 710: 600: 2151: 2137: 1227: 1213: 1077:Models with Deep Language Understanding". 904: 374: 1082: 1003: 866: 841: 810: 800: 760: 735: 698: 643: 97:, which transforms the input text into a 910: 891: 885: 778: 776: 774: 772: 707: 333: 289: 18: 1052: 978: 651: 606: 2569: 842:Rodriguez, Jesus (27 September 2022). 274:Following other text-to-image models, 191:. A model trained on the more diverse 2158: 2132: 1208: 1128: 1102: 953: 769: 627: 346: 314:(GANs) have been commonly used, with 2556:List of computer graphics algorithms 2065:Generative adversarial network (GAN) 1004:Rodriguez, Jesus (25 October 2022). 412:List of notable text-to-image models 2217: 1157: 867:Rodriguez, Jesus (4 October 2022). 89:Text-to-image models are generally 13: 1053:Edwards, Benj (9 September 2022). 1006:"🎨 Edge#237: What is Midjourney?" 892:Coldewey, Devin (5 January 2021). 14: 2588: 931:"Stable Diffusion Public Release" 2103: 2102: 2082: 1129:Roose, Kevin (21 October 2022). 911:Coldewey, Devin (6 April 2022). 380:This section is an excerpt from 231: 222: 211: 202: 171: 1177: 1103:Elgan, Mike (1 November 2022). 1046: 1022: 997: 972: 947: 923: 607:Vincent, James (May 24, 2022). 312:generative adversarial networks 185:generative adversarial networks 2015:Recurrent neural network (RNN) 2005:Differentiable neural computer 860: 835: 744: 1: 2513:3D computer graphics software 2060:Variational autoencoder (VAE) 2020:Long short-term memory (LSTM) 1287:Computational learning theory 593: 265:text-to-image personalization 158:(32×32 pixels, attained from 54:, as a result of advances in 2328:Hidden-surface determination 2040:Convolutional neural network 979:Edwards, Benj (2022-10-05). 954:Kumar, Ashish (2022-10-03). 812:10.1016/j.neunet.2021.07.019 7: 2035:Multilayer perceptron (MLP) 844:"🌅 Edge#229: VQGAN + CLIP" 588:Artificial intelligence art 581: 387:AI has the potential for a 329: 10: 2593: 2111:Artificial neural networks 2025:Gated recurrent unit (GRU) 1251:Differentiable programming 379: 369:Fréchet inception distance 364:image classification model 112: 2548: 2503: 2370: 2259: 2189: 2166: 2078: 1992: 1936: 1865: 1798: 1670: 1570: 1563: 1517: 1481: 1444:Artificial neural network 1424: 1300: 1267:Automatic differentiation 1240: 492: 447: 442: 286:Architecture and training 2577:Text-to-image generation 1272:Neuromorphic engineering 1235:Differentiable computing 1010:thesequence.substack.com 873:thesequence.substack.com 848:thesequence.substack.com 300:recurrent neural network 2540:Vector graphics editors 2535:Raster graphics editors 2045:Residual neural network 1461:Artificial Intelligence 389:societal transformation 375:Impact and applications 306:(LSTM) network, though 148:variational autoencoder 91:latent diffusion models 2423:Checkerboard rendering 558:CreativeML Open RAIL-M 339: 304:long short-term memory 295: 103:generative image model 41:machine learning model 32: 16:Machine learning model 2378:Affine transformation 2357:Surface triangulation 2301:Anisotropic filtering 2000:Neural Turing machine 1588:Human image synthesis 393:cyberpunk derivatives 337: 293: 141:University of Toronto 99:latent representation 43:which takes an input 22: 2091:Computer programming 2070:Graph neural network 1645:Text-to-video models 1623:Text-to-image models 1471:Large language model 1456:Scientific computing 1262:Statistical manifold 1257:Information geometry 324:large language model 107:scraped from the web 56:deep neural networks 2393:Collision detection 2321:Global illumination 1437:In-context learning 1277:Pattern recognition 417: 152:attention mechanism 117:Before the rise of 37:text-to-image model 2443:Scanline rendering 2237:Parallax scrolling 2227:Isometric graphics 2030:Echo state network 1918:Jürgen Schmidhuber 1613:Facial recognition 1608:Speech recognition 1518:Software libraries 1135:The New York Times 416: 347:Quality evaluation 340: 319:in finer details. 296: 132:The inverse task, 93:, which combine a 33: 2564: 2563: 2505:Graphics software 2398:Planar projection 2383:Back-face culling 2255: 2254: 2199:Alpha compositing 2160:Computer graphics 2126: 2125: 1888:Stephen Grossberg 1861: 1860: 1189:Google Cloud Blog 579: 578: 541:Midjourney, Inc. 269:textual inversion 70:, Stability AI's 2584: 2490:Volume rendering 2362:Wire-frame model 2215: 2214: 2153: 2146: 2139: 2130: 2129: 2116:Machine learning 2106: 2105: 2086: 1841:Action selection 1831:Self-driving car 1638:Stable Diffusion 1603:Speech synthesis 1568: 1567: 1432:Machine learning 1308:Gradient descent 1229: 1222: 1215: 1206: 1205: 1199: 1198: 1196: 1195: 1181: 1175: 1174: 1172: 1170: 1155: 1146: 1145: 1143: 1141: 1126: 1120: 1119: 1117: 1115: 1100: 1089: 1088: 1086: 1073: 1067: 1066: 1064: 1062: 1050: 1044: 1043: 1041: 1040: 1026: 1020: 1019: 1017: 1016: 1001: 995: 994: 992: 991: 976: 970: 969: 967: 966: 951: 945: 944: 942: 941: 927: 921: 920: 908: 902: 901: 889: 883: 882: 880: 879: 864: 858: 857: 855: 854: 839: 833: 832: 814: 804: 780: 767: 766: 764: 748: 742: 741: 739: 725: 716: 705: 704: 702: 686: 675: 674: 664: 655: 649: 648: 647: 631: 625: 624: 622: 620: 604: 572:Runway AI, Inc. 547:Stable Diffusion 418: 415: 405:proof-of-concept 316:diffusion models 261:Stable Diffusion 235: 226: 215: 206: 175: 134:image captioning 82:and human-drawn 80:real photographs 72:Stable Diffusion 45:natural language 29:Stable Diffusion 27:", generated by 2592: 2591: 2587: 2586: 2585: 2583: 2582: 2581: 2567: 2566: 2565: 2560: 2544: 2499: 2366: 2311:Fluid animation 2251: 2213: 2185: 2176:Diffusion curve 2168:Vector graphics 2162: 2157: 2127: 2122: 2074: 1988: 1954:Google DeepMind 1932: 1898:Geoffrey Hinton 1857: 1794: 1720:Project Debater 1666: 1564:Implementations 1559: 1513: 1477: 1420: 1362:Backpropagation 1296: 1282:Tensor calculus 1236: 1233: 1203: 1202: 1193: 1191: 1183: 1182: 1178: 1168: 1166: 1156: 1149: 1139: 1137: 1127: 1123: 1113: 1111: 1101: 1092: 1074: 1070: 1060: 1058: 1051: 1047: 1038: 1036: 1028: 1027: 1023: 1014: 1012: 1002: 998: 989: 987: 977: 973: 964: 962: 952: 948: 939: 937: 929: 928: 924: 909: 905: 890: 886: 877: 875: 865: 861: 852: 850: 840: 836: 789:Neural Networks 781: 770: 749: 745: 723: 717: 708: 687: 678: 662: 656: 652: 632: 628: 618: 616: 605: 601: 596: 584: 468:September 2023 414: 409: 408: 385: 377: 357:Inception Score 349: 332: 288: 245: 244: 243: 242: 238: 237: 236: 228: 227: 218: 217: 216: 208: 207: 181: 180: 179: 176: 115: 17: 12: 11: 5: 2590: 2580: 2579: 2562: 2561: 2559: 2558: 2552: 2550: 2546: 2545: 2543: 2542: 2537: 2532: 2531: 2530: 2525: 2520: 2509: 2507: 2501: 2500: 2498: 2497: 2492: 2487: 2482: 2477: 2472: 2467: 2462: 2460:Shadow mapping 2457: 2452: 2447: 2446: 2445: 2440: 2435: 2430: 2425: 2420: 2415: 2405: 2400: 2395: 2390: 2385: 2380: 2374: 2372: 2368: 2367: 2365: 2364: 2359: 2354: 2353: 2352: 2342: 2335: 2330: 2325: 2324: 2323: 2313: 2308: 2303: 2298: 2293: 2287: 2282: 2276: 2271: 2265: 2263: 2257: 2256: 2253: 2252: 2250: 2249: 2244: 2239: 2234: 2229: 2223: 2221: 2212: 2211: 2206: 2201: 2195: 2193: 2187: 2186: 2184: 2183: 2178: 2172: 2170: 2164: 2163: 2156: 2155: 2148: 2141: 2133: 2124: 2123: 2121: 2120: 2119: 2118: 2113: 2100: 2099: 2098: 2093: 2079: 2076: 2075: 2073: 2072: 2067: 2062: 2057: 2052: 2047: 2042: 2037: 2032: 2027: 2022: 2017: 2012: 2007: 2002: 1996: 1994: 1990: 1989: 1987: 1986: 1981: 1976: 1971: 1966: 1961: 1956: 1951: 1946: 1940: 1938: 1934: 1933: 1931: 1930: 1928:Ilya Sutskever 1925: 1920: 1915: 1910: 1905: 1900: 1895: 1893:Demis Hassabis 1890: 1885: 1883:Ian Goodfellow 1880: 1875: 1869: 1867: 1863: 1862: 1859: 1858: 1856: 1855: 1850: 1849: 1848: 1838: 1833: 1828: 1823: 1818: 1813: 1808: 1802: 1800: 1796: 1795: 1793: 1792: 1787: 1782: 1777: 1772: 1767: 1762: 1757: 1752: 1747: 1742: 1737: 1732: 1727: 1722: 1717: 1712: 1711: 1710: 1700: 1695: 1690: 1685: 1680: 1674: 1672: 1668: 1667: 1665: 1664: 1659: 1658: 1657: 1652: 1642: 1641: 1640: 1635: 1630: 1620: 1615: 1610: 1605: 1600: 1595: 1590: 1585: 1580: 1574: 1572: 1565: 1561: 1560: 1558: 1557: 1552: 1547: 1542: 1537: 1532: 1527: 1521: 1519: 1515: 1514: 1512: 1511: 1506: 1501: 1496: 1491: 1485: 1483: 1479: 1478: 1476: 1475: 1474: 1473: 1466:Language model 1463: 1458: 1453: 1452: 1451: 1441: 1440: 1439: 1428: 1426: 1422: 1421: 1419: 1418: 1416:Autoregression 1413: 1408: 1407: 1406: 1396: 1394:Regularization 1391: 1390: 1389: 1384: 1379: 1369: 1364: 1359: 1357:Loss functions 1354: 1349: 1344: 1339: 1334: 1333: 1332: 1322: 1317: 1316: 1315: 1304: 1302: 1298: 1297: 1295: 1294: 1292:Inductive bias 1289: 1284: 1279: 1274: 1269: 1264: 1259: 1254: 1246: 1244: 1238: 1237: 1232: 1231: 1224: 1217: 1209: 1201: 1200: 1176: 1158:Leswing, Kif. 1147: 1121: 1090: 1068: 1057:. Ars Technica 1045: 1021: 996: 971: 946: 922: 903: 884: 859: 834: 768: 743: 706: 676: 650: 626: 598: 597: 595: 592: 591: 590: 583: 580: 577: 576: 573: 570: 567: 561: 560: 555: 552: 549: 543: 542: 539: 536: 530: 529: 524: 521: 515: 514: 511: 505: 504: 503:December 2023 501: 497: 496: 491: 489: 483: 482: 479: 476: 470: 469: 466: 460: 459: 456: 450: 449: 446: 441: 438: 432: 431: 428: 425: 422: 413: 410: 386: 378: 376: 373: 348: 345: 331: 328: 287: 284: 276:language model 240: 239: 230: 229: 221: 220: 219: 210: 209: 201: 200: 199: 198: 197: 177: 170: 169: 168: 114: 111: 95:language model 15: 9: 6: 4: 3: 2: 2589: 2578: 2575: 2574: 2572: 2557: 2554: 2553: 2551: 2547: 2541: 2538: 2536: 2533: 2529: 2526: 2524: 2521: 2519: 2516: 2515: 2514: 2511: 2510: 2508: 2506: 2502: 2496: 2493: 2491: 2488: 2486: 2483: 2481: 2478: 2476: 2473: 2471: 2468: 2466: 2465:Shadow volume 2463: 2461: 2458: 2456: 2453: 2451: 2448: 2444: 2441: 2439: 2436: 2434: 2431: 2429: 2426: 2424: 2421: 2419: 2416: 2414: 2411: 2410: 2409: 2406: 2404: 2401: 2399: 2396: 2394: 2391: 2389: 2386: 2384: 2381: 2379: 2376: 2375: 2373: 2369: 2363: 2360: 2358: 2355: 2351: 2348: 2347: 2346: 2343: 2340: 2339:Triangle mesh 2336: 2334: 2331: 2329: 2326: 2322: 2319: 2318: 2317: 2314: 2312: 2309: 2307: 2304: 2302: 2299: 2297: 2294: 2291: 2288: 2286: 2283: 2281: 2277: 2275: 2272: 2270: 2269:3D projection 2267: 2266: 2264: 2262: 2258: 2248: 2245: 2243: 2240: 2238: 2235: 2233: 2230: 2228: 2225: 2224: 2222: 2220: 2216: 2210: 2209:Text-to-image 2207: 2205: 2202: 2200: 2197: 2196: 2194: 2192: 2188: 2182: 2179: 2177: 2174: 2173: 2171: 2169: 2165: 2161: 2154: 2149: 2147: 2142: 2140: 2135: 2134: 2131: 2117: 2114: 2112: 2109: 2108: 2101: 2097: 2094: 2092: 2089: 2088: 2085: 2081: 2080: 2077: 2071: 2068: 2066: 2063: 2061: 2058: 2056: 2053: 2051: 2048: 2046: 2043: 2041: 2038: 2036: 2033: 2031: 2028: 2026: 2023: 2021: 2018: 2016: 2013: 2011: 2008: 2006: 2003: 2001: 1998: 1997: 1995: 1993:Architectures 1991: 1985: 1982: 1980: 1977: 1975: 1972: 1970: 1967: 1965: 1962: 1960: 1957: 1955: 1952: 1950: 1947: 1945: 1942: 1941: 1939: 1937:Organizations 1935: 1929: 1926: 1924: 1921: 1919: 1916: 1914: 1911: 1909: 1906: 1904: 1901: 1899: 1896: 1894: 1891: 1889: 1886: 1884: 1881: 1879: 1876: 1874: 1873:Yoshua Bengio 1871: 1870: 1868: 1864: 1854: 1853:Robot control 1851: 1847: 1844: 1843: 1842: 1839: 1837: 1834: 1832: 1829: 1827: 1824: 1822: 1819: 1817: 1814: 1812: 1809: 1807: 1804: 1803: 1801: 1797: 1791: 1788: 1786: 1783: 1781: 1778: 1776: 1773: 1771: 1770:Chinchilla AI 1768: 1766: 1763: 1761: 1758: 1756: 1753: 1751: 1748: 1746: 1743: 1741: 1738: 1736: 1733: 1731: 1728: 1726: 1723: 1721: 1718: 1716: 1713: 1709: 1706: 1705: 1704: 1701: 1699: 1696: 1694: 1691: 1689: 1686: 1684: 1681: 1679: 1676: 1675: 1673: 1669: 1663: 1660: 1656: 1653: 1651: 1648: 1647: 1646: 1643: 1639: 1636: 1634: 1631: 1629: 1626: 1625: 1624: 1621: 1619: 1616: 1614: 1611: 1609: 1606: 1604: 1601: 1599: 1596: 1594: 1591: 1589: 1586: 1584: 1581: 1579: 1576: 1575: 1573: 1569: 1566: 1562: 1556: 1553: 1551: 1548: 1546: 1543: 1541: 1538: 1536: 1533: 1531: 1528: 1526: 1523: 1522: 1520: 1516: 1510: 1507: 1505: 1502: 1500: 1497: 1495: 1492: 1490: 1487: 1486: 1484: 1480: 1472: 1469: 1468: 1467: 1464: 1462: 1459: 1457: 1454: 1450: 1449:Deep learning 1447: 1446: 1445: 1442: 1438: 1435: 1434: 1433: 1430: 1429: 1427: 1423: 1417: 1414: 1412: 1409: 1405: 1402: 1401: 1400: 1397: 1395: 1392: 1388: 1385: 1383: 1380: 1378: 1375: 1374: 1373: 1370: 1368: 1365: 1363: 1360: 1358: 1355: 1353: 1350: 1348: 1345: 1343: 1340: 1338: 1337:Hallucination 1335: 1331: 1328: 1327: 1326: 1323: 1321: 1318: 1314: 1311: 1310: 1309: 1306: 1305: 1303: 1299: 1293: 1290: 1288: 1285: 1283: 1280: 1278: 1275: 1273: 1270: 1268: 1265: 1263: 1260: 1258: 1255: 1253: 1252: 1248: 1247: 1245: 1243: 1239: 1230: 1225: 1223: 1218: 1216: 1211: 1210: 1207: 1190: 1186: 1180: 1165: 1161: 1154: 1152: 1136: 1132: 1125: 1110: 1109:Computerworld 1106: 1099: 1097: 1095: 1085: 1080: 1072: 1056: 1049: 1035: 1034:phenaki.video 1031: 1025: 1011: 1007: 1000: 986: 982: 975: 961: 957: 950: 936: 932: 926: 918: 914: 907: 899: 895: 888: 874: 870: 863: 849: 845: 838: 830: 826: 822: 818: 813: 808: 803: 798: 794: 790: 786: 779: 777: 775: 773: 763: 758: 754: 747: 738: 733: 729: 722: 715: 713: 711: 701: 696: 692: 685: 683: 681: 672: 668: 661: 654: 646: 641: 637: 630: 614: 610: 603: 599: 589: 586: 585: 574: 571: 568: 566: 563: 562: 559: 556: 554:Stability AI 553: 550: 548: 545: 544: 540: 537: 535: 532: 531: 528: 525: 522: 520: 517: 516: 512: 510: 507: 506: 502: 499: 498: 495: 490: 488: 485: 484: 480: 477: 475: 472: 471: 467: 465: 462: 461: 457: 455: 452: 451: 445: 440:January 2021 439: 437: 434: 433: 429: 426: 424:Release date 423: 420: 419: 406: 402: 398: 394: 390: 383: 372: 370: 365: 362: 358: 353: 344: 336: 327: 325: 320: 317: 313: 309: 305: 301: 292: 283: 281: 280:text-to-video 277: 272: 270: 266: 262: 258: 254: 250: 234: 225: 214: 205: 196: 194: 190: 186: 174: 167: 165: 161: 157: 153: 149: 146: 142: 137: 135: 130: 128: 124: 120: 119:deep learning 110: 108: 104: 100: 96: 92: 87: 85: 81: 77: 73: 69: 65: 61: 57: 53: 48: 46: 42: 38: 30: 26: 21: 2470:Shear matrix 2433:Path tracing 2418:Cone tracing 2413:Beam tracing 2333:Polygon mesh 2274:3D rendering 2208: 1959:Hugging Face 1923:David Silver 1622: 1571:Audio–visual 1425:Applications 1404:Augmentation 1249: 1192:. Retrieved 1188: 1179: 1167:. Retrieved 1163: 1138:. Retrieved 1134: 1124: 1112:. Retrieved 1108: 1071: 1061:12 September 1059:. Retrieved 1048: 1037:. Retrieved 1033: 1024: 1013:. Retrieved 1009: 999: 988:. Retrieved 985:Ars Technica 984: 974: 963:. Retrieved 960:MarkTechPost 959: 949: 938:. Retrieved 935:Stability.Ai 934: 925: 916: 906: 897: 887: 876:. Retrieved 872: 862: 851:. Retrieved 847: 837: 792: 788: 752: 746: 727: 690: 673:: 1590–1595. 670: 666: 653: 635: 629: 617:. Retrieved 612: 602: 575:Proprietary 551:August 2022 478:August 2024 474:Ideogram 2.0 448:Proprietary 354: 350: 341: 321: 297: 273: 246: 188: 182: 164:training set 138: 131: 116: 88: 64:Google Brain 49: 36: 34: 2485:Translation 2438:Ray casting 2428:Ray tracing 2306:Cel shading 2280:Image-based 2261:3D graphics 2242:Ray casting 2191:2D graphics 2107:Categories 2055:Autoencoder 2010:Transformer 1878:Alex Graves 1826:OpenAI Five 1730:IBM Watsonx 1352:Convolution 1330:Overfitting 1169:16 November 1140:16 November 795:: 187–209. 615:. Vox Media 513:Unreleased 458:April 2022 361:Inceptionv3 308:transformer 257:transformer 2549:Algorithms 2403:Reflection 2096:Technology 1949:EleutherAI 1908:Fei-Fei Li 1903:Yann LeCun 1816:Q-learning 1799:Decisional 1725:IBM Watson 1633:Midjourney 1525:TensorFlow 1372:Activation 1325:Regression 1320:Clustering 1194:2024-01-02 1114:9 November 1084:2205.11487 1039:2022-10-03 1015:2022-10-26 990:2022-10-25 965:2022-10-03 940:2022-10-27 917:TechCrunch 898:TechCrunch 878:2022-10-10 853:2022-10-10 802:2101.09983 762:1511.02793 737:1605.05396 700:1511.02793 645:1910.09399 594:References 538:July 2022 534:Midjourney 527:Adobe Inc. 523:June 2023 427:Developer 401:inpainting 302:such as a 156:resolution 76:Midjourney 2528:rendering 2518:animation 2408:Rendering 1979:MIT CSAIL 1944:Anthropic 1913:Andrew Ng 1811:AlphaZero 1655:VideoPoet 1618:AlphaFold 1555:MindSpore 1509:SpiNNaker 1504:Memristor 1411:Diffusion 1387:Rectifier 1367:Batchnorm 1347:Attention 1342:Adversary 1030:"Phenaki" 829:231698782 613:The Verge 500:Imagen 2 481:Ideogram 397:solarpunk 278:-powered 145:recurrent 25:Hiroshige 2571:Category 2523:modeling 2450:Rotation 2388:Clipping 2371:Concepts 2350:Deferred 2316:Lighting 2296:Aliasing 2290:Unbiased 2285:Spectral 2087:Portals 1846:Auto-GPT 1678:Word2vec 1482:Hardware 1399:Datasets 1301:Concepts 821:34500257 582:See also 565:RunwayML 464:DALL-E 3 454:DALL-E 2 430:License 330:Datasets 160:resizing 150:with an 127:clip art 123:collages 101:, and a 60:DALL-E 2 2455:Scaling 2345:Shading 1969:Meta AI 1806:AlphaGo 1790:PanGu-Σ 1760:ChatGPT 1735:Granite 1683:Seq2seq 1662:Whisper 1583:WaveNet 1578:AlexNet 1550:Flux.jl 1530:PyTorch 1382:Sigmoid 1377:Softmax 1242:General 619:May 28, 519:Firefly 113:History 52:AI boom 2475:Shader 2247:Skybox 2232:Mode 7 2204:Layers 1984:Huawei 1964:OpenAI 1866:People 1836:MuZero 1698:Gemini 1693:Claude 1628:DALL-E 1540:Theano 827: 819: 494:Google 487:Imagen 444:OpenAI 436:DALL-E 253:DALL-E 249:OpenAI 74:, and 68:Imagen 2495:Voxel 2480:Texel 2181:Pixel 2050:Mamba 1821:SARSA 1785:LLaMA 1780:BLOOM 1765:GPT-J 1755:GPT-4 1750:GPT-3 1745:GPT-2 1740:GPT-1 1703:LaMDA 1535:Keras 1079:arXiv 825:S2CID 797:arXiv 757:arXiv 732:arXiv 724:(PDF) 695:arXiv 663:(PDF) 640:arXiv 569:2018 509:Parti 421:Name 395:like 39:is a 2219:2.5D 1974:Mila 1775:PaLM 1708:Bard 1688:BERT 1671:Text 1650:Sora 1171:2022 1164:CNBC 1142:2022 1116:2022 1063:2022 817:PMID 691:ICLR 667:AAAI 621:2022 255:, a 193:COCO 1715:NMT 1598:OCR 1593:HWR 1545:JAX 1499:VPU 1494:TPU 1489:IPU 1313:SGD 807:doi 793:144 251:'s 84:art 66:'s 2573:: 1187:. 1162:. 1150:^ 1133:. 1107:. 1093:^ 1032:. 1008:. 983:. 958:. 933:. 915:. 896:. 871:. 846:. 823:. 815:. 805:. 791:. 787:. 771:^ 755:. 730:. 726:. 709:^ 693:. 679:^ 669:. 665:. 638:, 611:. 166:. 129:. 109:. 86:. 62:, 35:A 2341:) 2337:( 2292:) 2278:( 2152:e 2145:t 2138:v 1228:e 1221:t 1214:v 1197:. 1173:. 1144:. 1118:. 1087:. 1081:: 1065:. 1042:. 1018:. 993:. 968:. 943:. 919:. 900:. 881:. 856:. 831:. 809:: 799:: 765:. 759:: 740:. 734:: 703:. 697:: 671:7 642:: 623:. 384:.

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

Knowledge

Text-to-image model

Index