Knowledge

Text-to-image model

Source 📝

233: 224: 213: 204: 20: 291: 335: 2104: 2084: 162:) and were considered to be 'low in diversity'. The model was able to generalize to objects not represented in the training data (such as a red school bus) and appropriately handled novel prompts such as "a stop sign is flying in blue skies", exhibiting output that it was not merely "memorizing" data from the 318:
also becoming a popular option in recent years. Rather than directly training a model to output a high-resolution image conditioned on a text embedding, a popular technique is to train a model to generate low-resolution images, and use one or more auxiliary deep learning models to upscale it, filling
351:
Evaluating and comparing the quality of text-to-image models is a problem involving assessing multiple desirable properties. A desideratum specific to text-to-image models is that generated images semantically align with the text captions used to generate them. A number of schemes have been devised
1076:
Saharia, Chitwan; Chan, William; Saxena, Saurabh; Li, Lala; Whang, Jay; Denton, Emily; Kamyar Seyed Ghasemipour, Seyed; Karagol Ayan, Burcu; Sara Mahdavi, S.; Gontijo Lopes, Rapha; Salimans, Tim; Ho, Jonathan; J Fleet, David; Norouzi, Mohammad (23 May 2022). "Photorealistic Text-to-Image Diffusion
343:
human annotators. Oxford-120 Flowers and CUB-200 Birds are smaller datasets of around 10,000 images each, restricted to flowers and birds, respectively. It is considered less difficult to train a high-quality text-to-image model with these datasets because of their narrow range of subject matter.
342:
Training a text-to-image model requires a dataset of images paired with text captions. One dataset commonly used for this purpose is the COCO dataset. Released by Microsoft in 2014, COCO consists of around 123,000 images depicting a diversity of objects with five captions per image, generated by
366:
when applied to a sample of images generated by the text-to-image model. The score is increased when the image classification model predicts a single label with high probability, a scheme intended to favour "distinct" generated images. Another popular metric is the related
399:) by amateurs, novel entertainment, fast prototyping, increasing art-making accessibility, and artistic output per effort and/or expenses and/or time—e.g., via generating drafts, draft-refinitions, and image components ( 195:(Common Objects in Context) dataset produced images which were "from a distance... encouraging", but which lacked coherence in their details. Later systems include VQGAN-CLIP, XMC-GAN, and GauGAN2. 173: 322:
Text-to-image models are trained on large datasets of (text, image) pairs, often scraped from the web. With their 2022 Imagen model, Google Brain reported positive results from using a
267:
allows to teach the model a new concept using a small set of images of a new object that was not included in the training set of the text-to-image foundation model. This is achieved by
187:
for the text-to-image task. With models trained on narrow, domain-specific datasets, they were able to generate "visually plausible" images of birds and flowers from text captions like
371:, which compares the distribution of generated images and real training images according to features extracted by one of the final layers of a pretrained image classification model. 407:-stage ideas. Additional functionalities or improvements may also relate to post-generation manual editing (i.e., polishing), such as subsequent tweaking with an image editor. 955: 1978: 1054: 241:
DALL·E 2's (top, April 2022) and DALL·E 3's (bottom, September 2023) generated images for the prompt "A stop sign is flying in blue skies"
105:, which produces an image conditioned on that representation. The most effective models have generally been trained on massive amounts of image and text data 980: 259:
system announced in January 2021. A successor capable of generating more complex and realistic images, DALL-E 2, was unveiled in April 2022, followed by
1226: 912: 1820: 893: 751:
Mansimov, Elman; Parisotto, Emilio; Ba, Jimmy Lei; Salakhutdinov, Ruslan (February 29, 2016). "Generating Images from Captions with Attention".
1184: 294:
High-level architecture showing the state of AI art machine learning models, and notable models and applications as a clickable SVG image map
689:
Mansimov, Elman; Parisotto, Emilio; Lei Ba, Jimmy; Salakhutdinov, Ruslan (November 2015). "Generating Images from Captions with Attention".
2150: 1104: 282:
platforms such as Runway, Make-A-Video, Imagen Video, Midjourney, and Phenaki can generate video from text and/or text/image prompts.
326:
trained separately on a text-only corpus (with its weights subsequently frozen), a departure from the theretofore standard approach.
1159: 608: 1336: 381: 178:
Eight images generated from the text prompt "A stop sign is flying in blue skies." by AlignDRAW (2015). Enlarged to show detail.
1219: 2539: 2512: 2555: 2159: 2009: 307: 256: 136:, was more tractable, and a number of image captioning deep learning models came prior to the first text-to-image models. 2110: 1661: 1398: 2279: 298:
Text-to-image models have been built using a variety of architectures. The text encoding step may be performed with a
2576: 1922: 1549: 1356: 1212: 1130: 557: 1877: 2143: 232: 223: 212: 203: 338:
Examples of images and captions from three public datasets which are commonly used to train text-to-image models
2402: 2203: 2064: 2004: 1602: 473: 311: 184: 2527: 2517: 2407: 2226: 1597: 1286: 1029: 268: 264: 368: 2522: 2387: 2327: 2039: 1436: 1393: 1346: 1341: 403:). Generated images are sometimes used as sketches, low-cost experiments, inspiration, or illustrations of 2090: 1386: 1312: 587: 133: 2315: 2136: 1714: 1649: 1250: 956:"Meta AI Introduces 'Make-A-Video': An Artificial Intelligence System That Generates Videos From Text" 2115: 1973: 1612: 1443: 1266: 19: 2014: 1271: 299: 144: 2059: 2044: 1697: 1692: 1592: 1460: 1241: 388: 147: 139:
The first modern text-to-image model, alignDRAW, was introduced in 2015 by researchers from the
2534: 2484: 2449: 2427: 2422: 2019: 1779: 1498: 1493: 508: 486: 303: 90: 40: 783:
Frolov, Stanislav; Hinz, Tobias; Raue, Federico; Hees, Jörn; Dengel, Andreas (December 2021).
658:
Zhu, Xiaojin; Goldberg, Andrew B.; Eldawy, Mohamed; Dyer, Charles R.; Strock, Bradley (2007).
2377: 2356: 2300: 2049: 2034: 1999: 1687: 1587: 1455: 719:
Reed, Scott; Akata, Zeynep; Logeswaran, Lajanugen; Schiele, Bernt; Lee, Honglak (June 2016).
392: 140: 98: 67: 1917: 930: 2260: 2246: 2190: 2069: 2024: 1470: 1415: 1261: 1256: 323: 310:
models have since become a more popular option. For the image generation step, conditional
290: 8: 2392: 2320: 1644: 1371: 1366: 1324: 1276: 1160:"Why Silicon Valley is so excited about awkward drawings done by artificial intelligence" 151: 50:
Text-to-image models began to be developed in the mid-2010s during the beginnings of the
2454: 2442: 2289: 2284: 2236: 2029: 1607: 1078: 824: 796: 756: 731: 694: 659: 639: 913:"OpenAI's new DALL-E model draws anything — but bigger, better and faster than before" 2504: 2397: 2382: 2198: 2095: 2083: 1887: 1539: 1410: 1403: 828: 816: 2489: 2479: 2361: 2349: 1840: 1830: 1637: 1431: 1381: 1376: 1319: 1307: 806: 546: 404: 352:
for assessing these qualities, some automated and others based on human judgement.
334: 260: 155: 154:) to be conditioned on text sequences. Images generated by alignDRAW were in small 102: 71: 44: 28: 391:, which may include enabling the expansion of noncommercial niche genres (such as 2310: 2175: 2167: 1953: 1897: 1719: 1361: 1281: 811: 784: 363: 356: 315: 247:
One of the first text-to-image models to capture widespread public attention was
636:
A Survey and Taxonomy of Adversarial Neural Networks for Text-to-Image Synthesis
58:. In 2022, the output of state-of-the-art text-to-image models—such as OpenAI's 2459: 1927: 1892: 1882: 1707: 1465: 1291: 634:
Agnese, Jorge; Herrera, Jonathan; Tao, Haicheng; Zhu, Xingquan (October 2019),
275: 192: 172: 143:. alignDRAW extended the previously-introduced DRAW architecture (which used a 94: 894:"OpenAI's DALL-E creates plausible images of literally anything you ask it to" 720: 2570: 2464: 2338: 2268: 1872: 1852: 1769: 1448: 1005: 868: 843: 518: 359:(IS), which is based on the distribution of labels predicted by a pretrained 355:
A common algorithmic metric for assessing image quality and diversity is the
279: 159: 118: 55: 1098: 1096: 1094: 2469: 2432: 2417: 2412: 2332: 2273: 1958: 1789: 1204: 820: 106: 63: 1153: 1151: 2437: 2305: 2241: 2054: 1825: 1734: 1729: 1351: 1329: 1091: 360: 1122: 1948: 1907: 1902: 1815: 1724: 1632: 1544: 1524: 1148: 533: 526: 400: 79: 75: 1055:"Runway teases AI-powered text-to-video editing using written prompts" 1943: 1912: 1810: 1654: 1617: 1554: 1508: 1503: 1488: 609:"All these images were generated by Google's latest text-to-image AI" 396: 24: 2128: 688: 23:
An image conditioned on the prompt "an astronaut riding a horse, by
2295: 1845: 1677: 1083: 801: 761: 736: 699: 644: 564: 271:, namely, finding a new text term that correspond to these images. 163: 126: 125:
by arranging existing component images, such as from a database of
83: 59: 2344: 1968: 1805: 1759: 1682: 1582: 1577: 1529: 981:"Google's newest AI generator creates HD video from text prompts" 660:"A text-to-picture synthesis system for augmenting communication" 122: 51: 750: 2474: 2231: 1983: 1963: 1835: 1627: 493: 463: 453: 443: 435: 252: 248: 47:
description and produces an image matching that description.
2494: 2180: 1784: 1764: 1754: 1749: 1744: 1739: 1702: 1534: 2218: 1774: 263:
that was publicly released in August 2022. In August 2022,
1131:"A.I.-Generated Art Is Already Transforming Creative Work" 718: 183:
In 2016, Reed, Akata, Yan et al. became the first to use
121:, attempts to build text-to-image models were limited to 1105:"How 'synthetic media' will transform business forever" 411: 189:"an all black bird with a distinct thick, rounded bill" 657: 382:
Artificial intelligence art § Impact and applications
753:
International Conference on Learning Representations
31:, a large-scale text-to-image model released in 2022 1075: 633: 78:—began to be considered to approach the quality of 1185:"Imagen 2 on Vertex AI is now generally available" 869:"🎆🌆 Edge#231: Text-to-Image Synthesis with GANs" 782: 2568: 721:"Generative Adversarial Text to Image Synthesis" 785:"Adversarial text-to-image synthesis: A review" 684: 682: 680: 2144: 1220: 1234: 1069: 728:International Conference on Machine Learning 677: 285: 714: 712: 710: 600: 2151: 2137: 1227: 1213: 1077:Models with Deep Language Understanding". 904: 374: 1082: 1003: 866: 841: 810: 800: 760: 735: 698: 643: 97:, which transforms the input text into a 910: 891: 885: 778: 776: 774: 772: 707: 333: 289: 18: 1052: 978: 651: 606: 2569: 842:Rodriguez, Jesus (27 September 2022). 274:Following other text-to-image models, 191:. A model trained on the more diverse 2158: 2132: 1208: 1128: 1102: 953: 769: 627: 346: 314:(GANs) have been commonly used, with 2556:List of computer graphics algorithms 2065:Generative adversarial network (GAN) 1004:Rodriguez, Jesus (25 October 2022). 412:List of notable text-to-image models 2217: 1157: 867:Rodriguez, Jesus (4 October 2022). 89:Text-to-image models are generally 13: 1053:Edwards, Benj (9 September 2022). 1006:"🎨 Edge#237: What is Midjourney?" 892:Coldewey, Devin (5 January 2021). 14: 2588: 931:"Stable Diffusion Public Release" 2103: 2102: 2082: 1129:Roose, Kevin (21 October 2022). 911:Coldewey, Devin (6 April 2022). 380:This section is an excerpt from 231: 222: 211: 202: 171: 1177: 1103:Elgan, Mike (1 November 2022). 1046: 1022: 997: 972: 947: 923: 607:Vincent, James (May 24, 2022). 312:generative adversarial networks 185:generative adversarial networks 2015:Recurrent neural network (RNN) 2005:Differentiable neural computer 860: 835: 744: 1: 2513:3D computer graphics software 2060:Variational autoencoder (VAE) 2020:Long short-term memory (LSTM) 1287:Computational learning theory 593: 265:text-to-image personalization 158:(32×32 pixels, attained from 54:, as a result of advances in 2328:Hidden-surface determination 2040:Convolutional neural network 979:Edwards, Benj (2022-10-05). 954:Kumar, Ashish (2022-10-03). 812:10.1016/j.neunet.2021.07.019 7: 2035:Multilayer perceptron (MLP) 844:"🌅 Edge#229: VQGAN + CLIP" 588:Artificial intelligence art 581: 387:AI has the potential for a 329: 10: 2593: 2111:Artificial neural networks 2025:Gated recurrent unit (GRU) 1251:Differentiable programming 379: 369:Fréchet inception distance 364:image classification model 112: 2548: 2503: 2370: 2259: 2189: 2166: 2078: 1992: 1936: 1865: 1798: 1670: 1570: 1563: 1517: 1481: 1444:Artificial neural network 1424: 1300: 1267:Automatic differentiation 1240: 492: 447: 442: 286:Architecture and training 2577:Text-to-image generation 1272:Neuromorphic engineering 1235:Differentiable computing 1010:thesequence.substack.com 873:thesequence.substack.com 848:thesequence.substack.com 300:recurrent neural network 2540:Vector graphics editors 2535:Raster graphics editors 2045:Residual neural network 1461:Artificial Intelligence 389:societal transformation 375:Impact and applications 306:(LSTM) network, though 148:variational autoencoder 91:latent diffusion models 2423:Checkerboard rendering 558:CreativeML Open RAIL-M 339: 304:long short-term memory 295: 103:generative image model 41:machine learning model 32: 16:Machine learning model 2378:Affine transformation 2357:Surface triangulation 2301:Anisotropic filtering 2000:Neural Turing machine 1588:Human image synthesis 393:cyberpunk derivatives 337: 293: 141:University of Toronto 99:latent representation 43:which takes an input 22: 2091:Computer programming 2070:Graph neural network 1645:Text-to-video models 1623:Text-to-image models 1471:Large language model 1456:Scientific computing 1262:Statistical manifold 1257:Information geometry 324:large language model 107:scraped from the web 56:deep neural networks 2393:Collision detection 2321:Global illumination 1437:In-context learning 1277:Pattern recognition 417: 152:attention mechanism 117:Before the rise of 37:text-to-image model 2443:Scanline rendering 2237:Parallax scrolling 2227:Isometric graphics 2030:Echo state network 1918:Jürgen Schmidhuber 1613:Facial recognition 1608:Speech recognition 1518:Software libraries 1135:The New York Times 416: 347:Quality evaluation 340: 319:in finer details. 296: 132:The inverse task, 93:, which combine a 33: 2564: 2563: 2505:Graphics software 2398:Planar projection 2383:Back-face culling 2255: 2254: 2199:Alpha compositing 2160:Computer graphics 2126: 2125: 1888:Stephen Grossberg 1861: 1860: 1189:Google Cloud Blog 579: 578: 541:Midjourney, Inc. 269:textual inversion 70:, Stability AI's 2584: 2490:Volume rendering 2362:Wire-frame model 2215: 2214: 2153: 2146: 2139: 2130: 2129: 2116:Machine learning 2106: 2105: 2086: 1841:Action selection 1831:Self-driving car 1638:Stable Diffusion 1603:Speech synthesis 1568: 1567: 1432:Machine learning 1308:Gradient descent 1229: 1222: 1215: 1206: 1205: 1199: 1198: 1196: 1195: 1181: 1175: 1174: 1172: 1170: 1155: 1146: 1145: 1143: 1141: 1126: 1120: 1119: 1117: 1115: 1100: 1089: 1088: 1086: 1073: 1067: 1066: 1064: 1062: 1050: 1044: 1043: 1041: 1040: 1026: 1020: 1019: 1017: 1016: 1001: 995: 994: 992: 991: 976: 970: 969: 967: 966: 951: 945: 944: 942: 941: 927: 921: 920: 908: 902: 901: 889: 883: 882: 880: 879: 864: 858: 857: 855: 854: 839: 833: 832: 814: 804: 780: 767: 766: 764: 748: 742: 741: 739: 725: 716: 705: 704: 702: 686: 675: 674: 664: 655: 649: 648: 647: 631: 625: 624: 622: 620: 604: 572:Runway AI, Inc. 547:Stable Diffusion 418: 415: 405:proof-of-concept 316:diffusion models 261:Stable Diffusion 235: 226: 215: 206: 175: 134:image captioning 82:and human-drawn 80:real photographs 72:Stable Diffusion 45:natural language 29:Stable Diffusion 27:", generated by 2592: 2591: 2587: 2586: 2585: 2583: 2582: 2581: 2567: 2566: 2565: 2560: 2544: 2499: 2366: 2311:Fluid animation 2251: 2213: 2185: 2176:Diffusion curve 2168:Vector graphics 2162: 2157: 2127: 2122: 2074: 1988: 1954:Google DeepMind 1932: 1898:Geoffrey Hinton 1857: 1794: 1720:Project Debater 1666: 1564:Implementations 1559: 1513: 1477: 1420: 1362:Backpropagation 1296: 1282:Tensor calculus 1236: 1233: 1203: 1202: 1193: 1191: 1183: 1182: 1178: 1168: 1166: 1156: 1149: 1139: 1137: 1127: 1123: 1113: 1111: 1101: 1092: 1074: 1070: 1060: 1058: 1051: 1047: 1038: 1036: 1028: 1027: 1023: 1014: 1012: 1002: 998: 989: 987: 977: 973: 964: 962: 952: 948: 939: 937: 929: 928: 924: 909: 905: 890: 886: 877: 875: 865: 861: 852: 850: 840: 836: 789:Neural Networks 781: 770: 749: 745: 723: 717: 708: 687: 678: 662: 656: 652: 632: 628: 618: 616: 605: 601: 596: 584: 468:September 2023 414: 409: 408: 385: 377: 357:Inception Score 349: 332: 288: 245: 244: 243: 242: 238: 237: 236: 228: 227: 218: 217: 216: 208: 207: 181: 180: 179: 176: 115: 17: 12: 11: 5: 2590: 2580: 2579: 2562: 2561: 2559: 2558: 2552: 2550: 2546: 2545: 2543: 2542: 2537: 2532: 2531: 2530: 2525: 2520: 2509: 2507: 2501: 2500: 2498: 2497: 2492: 2487: 2482: 2477: 2472: 2467: 2462: 2460:Shadow mapping 2457: 2452: 2447: 2446: 2445: 2440: 2435: 2430: 2425: 2420: 2415: 2405: 2400: 2395: 2390: 2385: 2380: 2374: 2372: 2368: 2367: 2365: 2364: 2359: 2354: 2353: 2352: 2342: 2335: 2330: 2325: 2324: 2323: 2313: 2308: 2303: 2298: 2293: 2287: 2282: 2276: 2271: 2265: 2263: 2257: 2256: 2253: 2252: 2250: 2249: 2244: 2239: 2234: 2229: 2223: 2221: 2212: 2211: 2206: 2201: 2195: 2193: 2187: 2186: 2184: 2183: 2178: 2172: 2170: 2164: 2163: 2156: 2155: 2148: 2141: 2133: 2124: 2123: 2121: 2120: 2119: 2118: 2113: 2100: 2099: 2098: 2093: 2079: 2076: 2075: 2073: 2072: 2067: 2062: 2057: 2052: 2047: 2042: 2037: 2032: 2027: 2022: 2017: 2012: 2007: 2002: 1996: 1994: 1990: 1989: 1987: 1986: 1981: 1976: 1971: 1966: 1961: 1956: 1951: 1946: 1940: 1938: 1934: 1933: 1931: 1930: 1928:Ilya Sutskever 1925: 1920: 1915: 1910: 1905: 1900: 1895: 1893:Demis Hassabis 1890: 1885: 1883:Ian Goodfellow 1880: 1875: 1869: 1867: 1863: 1862: 1859: 1858: 1856: 1855: 1850: 1849: 1848: 1838: 1833: 1828: 1823: 1818: 1813: 1808: 1802: 1800: 1796: 1795: 1793: 1792: 1787: 1782: 1777: 1772: 1767: 1762: 1757: 1752: 1747: 1742: 1737: 1732: 1727: 1722: 1717: 1712: 1711: 1710: 1700: 1695: 1690: 1685: 1680: 1674: 1672: 1668: 1667: 1665: 1664: 1659: 1658: 1657: 1652: 1642: 1641: 1640: 1635: 1630: 1620: 1615: 1610: 1605: 1600: 1595: 1590: 1585: 1580: 1574: 1572: 1565: 1561: 1560: 1558: 1557: 1552: 1547: 1542: 1537: 1532: 1527: 1521: 1519: 1515: 1514: 1512: 1511: 1506: 1501: 1496: 1491: 1485: 1483: 1479: 1478: 1476: 1475: 1474: 1473: 1466:Language model 1463: 1458: 1453: 1452: 1451: 1441: 1440: 1439: 1428: 1426: 1422: 1421: 1419: 1418: 1416:Autoregression 1413: 1408: 1407: 1406: 1396: 1394:Regularization 1391: 1390: 1389: 1384: 1379: 1369: 1364: 1359: 1357:Loss functions 1354: 1349: 1344: 1339: 1334: 1333: 1332: 1322: 1317: 1316: 1315: 1304: 1302: 1298: 1297: 1295: 1294: 1292:Inductive bias 1289: 1284: 1279: 1274: 1269: 1264: 1259: 1254: 1246: 1244: 1238: 1237: 1232: 1231: 1224: 1217: 1209: 1201: 1200: 1176: 1158:Leswing, Kif. 1147: 1121: 1090: 1068: 1057:. Ars Technica 1045: 1021: 996: 971: 946: 922: 903: 884: 859: 834: 768: 743: 706: 676: 650: 626: 598: 597: 595: 592: 591: 590: 583: 580: 577: 576: 573: 570: 567: 561: 560: 555: 552: 549: 543: 542: 539: 536: 530: 529: 524: 521: 515: 514: 511: 505: 504: 503:December 2023 501: 497: 496: 491: 489: 483: 482: 479: 476: 470: 469: 466: 460: 459: 456: 450: 449: 446: 441: 438: 432: 431: 428: 425: 422: 413: 410: 386: 378: 376: 373: 348: 345: 331: 328: 287: 284: 276:language model 240: 239: 230: 229: 221: 220: 219: 210: 209: 201: 200: 199: 198: 197: 177: 170: 169: 168: 114: 111: 95:language model 15: 9: 6: 4: 3: 2: 2589: 2578: 2575: 2574: 2572: 2557: 2554: 2553: 2551: 2547: 2541: 2538: 2536: 2533: 2529: 2526: 2524: 2521: 2519: 2516: 2515: 2514: 2511: 2510: 2508: 2506: 2502: 2496: 2493: 2491: 2488: 2486: 2483: 2481: 2478: 2476: 2473: 2471: 2468: 2466: 2465:Shadow volume 2463: 2461: 2458: 2456: 2453: 2451: 2448: 2444: 2441: 2439: 2436: 2434: 2431: 2429: 2426: 2424: 2421: 2419: 2416: 2414: 2411: 2410: 2409: 2406: 2404: 2401: 2399: 2396: 2394: 2391: 2389: 2386: 2384: 2381: 2379: 2376: 2375: 2373: 2369: 2363: 2360: 2358: 2355: 2351: 2348: 2347: 2346: 2343: 2340: 2339:Triangle mesh 2336: 2334: 2331: 2329: 2326: 2322: 2319: 2318: 2317: 2314: 2312: 2309: 2307: 2304: 2302: 2299: 2297: 2294: 2291: 2288: 2286: 2283: 2281: 2277: 2275: 2272: 2270: 2269:3D projection 2267: 2266: 2264: 2262: 2258: 2248: 2245: 2243: 2240: 2238: 2235: 2233: 2230: 2228: 2225: 2224: 2222: 2220: 2216: 2210: 2209:Text-to-image 2207: 2205: 2202: 2200: 2197: 2196: 2194: 2192: 2188: 2182: 2179: 2177: 2174: 2173: 2171: 2169: 2165: 2161: 2154: 2149: 2147: 2142: 2140: 2135: 2134: 2131: 2117: 2114: 2112: 2109: 2108: 2101: 2097: 2094: 2092: 2089: 2088: 2085: 2081: 2080: 2077: 2071: 2068: 2066: 2063: 2061: 2058: 2056: 2053: 2051: 2048: 2046: 2043: 2041: 2038: 2036: 2033: 2031: 2028: 2026: 2023: 2021: 2018: 2016: 2013: 2011: 2008: 2006: 2003: 2001: 1998: 1997: 1995: 1993:Architectures 1991: 1985: 1982: 1980: 1977: 1975: 1972: 1970: 1967: 1965: 1962: 1960: 1957: 1955: 1952: 1950: 1947: 1945: 1942: 1941: 1939: 1937:Organizations 1935: 1929: 1926: 1924: 1921: 1919: 1916: 1914: 1911: 1909: 1906: 1904: 1901: 1899: 1896: 1894: 1891: 1889: 1886: 1884: 1881: 1879: 1876: 1874: 1873:Yoshua Bengio 1871: 1870: 1868: 1864: 1854: 1853:Robot control 1851: 1847: 1844: 1843: 1842: 1839: 1837: 1834: 1832: 1829: 1827: 1824: 1822: 1819: 1817: 1814: 1812: 1809: 1807: 1804: 1803: 1801: 1797: 1791: 1788: 1786: 1783: 1781: 1778: 1776: 1773: 1771: 1770:Chinchilla AI 1768: 1766: 1763: 1761: 1758: 1756: 1753: 1751: 1748: 1746: 1743: 1741: 1738: 1736: 1733: 1731: 1728: 1726: 1723: 1721: 1718: 1716: 1713: 1709: 1706: 1705: 1704: 1701: 1699: 1696: 1694: 1691: 1689: 1686: 1684: 1681: 1679: 1676: 1675: 1673: 1669: 1663: 1660: 1656: 1653: 1651: 1648: 1647: 1646: 1643: 1639: 1636: 1634: 1631: 1629: 1626: 1625: 1624: 1621: 1619: 1616: 1614: 1611: 1609: 1606: 1604: 1601: 1599: 1596: 1594: 1591: 1589: 1586: 1584: 1581: 1579: 1576: 1575: 1573: 1569: 1566: 1562: 1556: 1553: 1551: 1548: 1546: 1543: 1541: 1538: 1536: 1533: 1531: 1528: 1526: 1523: 1522: 1520: 1516: 1510: 1507: 1505: 1502: 1500: 1497: 1495: 1492: 1490: 1487: 1486: 1484: 1480: 1472: 1469: 1468: 1467: 1464: 1462: 1459: 1457: 1454: 1450: 1449:Deep learning 1447: 1446: 1445: 1442: 1438: 1435: 1434: 1433: 1430: 1429: 1427: 1423: 1417: 1414: 1412: 1409: 1405: 1402: 1401: 1400: 1397: 1395: 1392: 1388: 1385: 1383: 1380: 1378: 1375: 1374: 1373: 1370: 1368: 1365: 1363: 1360: 1358: 1355: 1353: 1350: 1348: 1345: 1343: 1340: 1338: 1337:Hallucination 1335: 1331: 1328: 1327: 1326: 1323: 1321: 1318: 1314: 1311: 1310: 1309: 1306: 1305: 1303: 1299: 1293: 1290: 1288: 1285: 1283: 1280: 1278: 1275: 1273: 1270: 1268: 1265: 1263: 1260: 1258: 1255: 1253: 1252: 1248: 1247: 1245: 1243: 1239: 1230: 1225: 1223: 1218: 1216: 1211: 1210: 1207: 1190: 1186: 1180: 1165: 1161: 1154: 1152: 1136: 1132: 1125: 1110: 1109:Computerworld 1106: 1099: 1097: 1095: 1085: 1080: 1072: 1056: 1049: 1035: 1034:phenaki.video 1031: 1025: 1011: 1007: 1000: 986: 982: 975: 961: 957: 950: 936: 932: 926: 918: 914: 907: 899: 895: 888: 874: 870: 863: 849: 845: 838: 830: 826: 822: 818: 813: 808: 803: 798: 794: 790: 786: 779: 777: 775: 773: 763: 758: 754: 747: 738: 733: 729: 722: 715: 713: 711: 701: 696: 692: 685: 683: 681: 672: 668: 661: 654: 646: 641: 637: 630: 614: 610: 603: 599: 589: 586: 585: 574: 571: 568: 566: 563: 562: 559: 556: 554:Stability AI 553: 550: 548: 545: 544: 540: 537: 535: 532: 531: 528: 525: 522: 520: 517: 516: 512: 510: 507: 506: 502: 499: 498: 495: 490: 488: 485: 484: 480: 477: 475: 472: 471: 467: 465: 462: 461: 457: 455: 452: 451: 445: 440:January 2021 439: 437: 434: 433: 429: 426: 424:Release date 423: 420: 419: 406: 402: 398: 394: 390: 383: 372: 370: 365: 362: 358: 353: 344: 336: 327: 325: 320: 317: 313: 309: 305: 301: 292: 283: 281: 280:text-to-video 277: 272: 270: 266: 262: 258: 254: 250: 234: 225: 214: 205: 196: 194: 190: 186: 174: 167: 165: 161: 157: 153: 149: 146: 142: 137: 135: 130: 128: 124: 120: 119:deep learning 110: 108: 104: 100: 96: 92: 87: 85: 81: 77: 73: 69: 65: 61: 57: 53: 48: 46: 42: 38: 30: 26: 21: 2470:Shear matrix 2433:Path tracing 2418:Cone tracing 2413:Beam tracing 2333:Polygon mesh 2274:3D rendering 2208: 1959:Hugging Face 1923:David Silver 1622: 1571:Audio–visual 1425:Applications 1404:Augmentation 1249: 1192:. Retrieved 1188: 1179: 1167:. Retrieved 1163: 1138:. Retrieved 1134: 1124: 1112:. Retrieved 1108: 1071: 1061:12 September 1059:. Retrieved 1048: 1037:. Retrieved 1033: 1024: 1013:. Retrieved 1009: 999: 988:. Retrieved 985:Ars Technica 984: 974: 963:. Retrieved 960:MarkTechPost 959: 949: 938:. Retrieved 935:Stability.Ai 934: 925: 916: 906: 897: 887: 876:. Retrieved 872: 862: 851:. Retrieved 847: 837: 792: 788: 752: 746: 727: 690: 673:: 1590–1595. 670: 666: 653: 635: 629: 617:. Retrieved 612: 602: 575:Proprietary 551:August 2022 478:August 2024 474:Ideogram 2.0 448:Proprietary 354: 350: 341: 321: 297: 273: 246: 188: 182: 164:training set 138: 131: 116: 88: 64:Google Brain 49: 36: 34: 2485:Translation 2438:Ray casting 2428:Ray tracing 2306:Cel shading 2280:Image-based 2261:3D graphics 2242:Ray casting 2191:2D graphics 2107:Categories 2055:Autoencoder 2010:Transformer 1878:Alex Graves 1826:OpenAI Five 1730:IBM Watsonx 1352:Convolution 1330:Overfitting 1169:16 November 1140:16 November 795:: 187–209. 615:. Vox Media 513:Unreleased 458:April 2022 361:Inceptionv3 308:transformer 257:transformer 2549:Algorithms 2403:Reflection 2096:Technology 1949:EleutherAI 1908:Fei-Fei Li 1903:Yann LeCun 1816:Q-learning 1799:Decisional 1725:IBM Watson 1633:Midjourney 1525:TensorFlow 1372:Activation 1325:Regression 1320:Clustering 1194:2024-01-02 1114:9 November 1084:2205.11487 1039:2022-10-03 1015:2022-10-26 990:2022-10-25 965:2022-10-03 940:2022-10-27 917:TechCrunch 898:TechCrunch 878:2022-10-10 853:2022-10-10 802:2101.09983 762:1511.02793 737:1605.05396 700:1511.02793 645:1910.09399 594:References 538:July 2022 534:Midjourney 527:Adobe Inc. 523:June 2023 427:Developer 401:inpainting 302:such as a 156:resolution 76:Midjourney 2528:rendering 2518:animation 2408:Rendering 1979:MIT CSAIL 1944:Anthropic 1913:Andrew Ng 1811:AlphaZero 1655:VideoPoet 1618:AlphaFold 1555:MindSpore 1509:SpiNNaker 1504:Memristor 1411:Diffusion 1387:Rectifier 1367:Batchnorm 1347:Attention 1342:Adversary 1030:"Phenaki" 829:231698782 613:The Verge 500:Imagen 2 481:Ideogram 397:solarpunk 278:-powered 145:recurrent 25:Hiroshige 2571:Category 2523:modeling 2450:Rotation 2388:Clipping 2371:Concepts 2350:Deferred 2316:Lighting 2296:Aliasing 2290:Unbiased 2285:Spectral 2087:Portals 1846:Auto-GPT 1678:Word2vec 1482:Hardware 1399:Datasets 1301:Concepts 821:34500257 582:See also 565:RunwayML 464:DALL-E 3 454:DALL-E 2 430:License 330:Datasets 160:resizing 150:with an 127:clip art 123:collages 101:, and a 60:DALL-E 2 2455:Scaling 2345:Shading 1969:Meta AI 1806:AlphaGo 1790:PanGu-Σ 1760:ChatGPT 1735:Granite 1683:Seq2seq 1662:Whisper 1583:WaveNet 1578:AlexNet 1550:Flux.jl 1530:PyTorch 1382:Sigmoid 1377:Softmax 1242:General 619:May 28, 519:Firefly 113:History 52:AI boom 2475:Shader 2247:Skybox 2232:Mode 7 2204:Layers 1984:Huawei 1964:OpenAI 1866:People 1836:MuZero 1698:Gemini 1693:Claude 1628:DALL-E 1540:Theano 827:  819:  494:Google 487:Imagen 444:OpenAI 436:DALL-E 253:DALL-E 249:OpenAI 74:, and 68:Imagen 2495:Voxel 2480:Texel 2181:Pixel 2050:Mamba 1821:SARSA 1785:LLaMA 1780:BLOOM 1765:GPT-J 1755:GPT-4 1750:GPT-3 1745:GPT-2 1740:GPT-1 1703:LaMDA 1535:Keras 1079:arXiv 825:S2CID 797:arXiv 757:arXiv 732:arXiv 724:(PDF) 695:arXiv 663:(PDF) 640:arXiv 569:2018 509:Parti 421:Name 395:like 39:is a 2219:2.5D 1974:Mila 1775:PaLM 1708:Bard 1688:BERT 1671:Text 1650:Sora 1171:2022 1164:CNBC 1142:2022 1116:2022 1063:2022 817:PMID 691:ICLR 667:AAAI 621:2022 255:, a 193:COCO 1715:NMT 1598:OCR 1593:HWR 1545:JAX 1499:VPU 1494:TPU 1489:IPU 1313:SGD 807:doi 793:144 251:'s 84:art 66:'s 2573:: 1187:. 1162:. 1150:^ 1133:. 1107:. 1093:^ 1032:. 1008:. 983:. 958:. 933:. 915:. 896:. 871:. 846:. 823:. 815:. 805:. 791:. 787:. 771:^ 755:. 730:. 726:. 709:^ 693:. 679:^ 669:. 665:. 638:, 611:. 166:. 129:. 109:. 86:. 62:, 35:A 2341:) 2337:( 2292:) 2278:( 2152:e 2145:t 2138:v 1228:e 1221:t 1214:v 1197:. 1173:. 1144:. 1118:. 1087:. 1081:: 1065:. 1042:. 1018:. 993:. 968:. 943:. 919:. 900:. 881:. 856:. 831:. 809:: 799:: 765:. 759:: 740:. 734:: 703:. 697:: 671:7 642:: 623:. 384:.

Index


Hiroshige
Stable Diffusion
machine learning model
natural language
AI boom
deep neural networks
DALL-E 2
Google Brain
Imagen
Stable Diffusion
Midjourney
real photographs
art
latent diffusion models
language model
latent representation
generative image model
scraped from the web
deep learning
collages
clip art
image captioning
University of Toronto
recurrent
variational autoencoder
attention mechanism
resolution
resizing
training set

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.