Knowledge

Apache Spark

Source đź“ť

3000: 228: 27: 4617: 3847: 411:, or a custom solution can be implemented. Spark also supports a pseudo-distributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file system can be used instead; in such a scenario, Spark is run on a single machine with one executor per 1840:
The last minor release within a major a release will typically be maintained for longer as an “LTS” release. For example, 2.4.0 was released on November 2, 2018, and had been maintained for 31 months until 2.4.8 was released in May 2021. 2.4.8 is the last release and no more 2.4.x releases should be
1836:
Feature release branches will, generally, be maintained with bug fix releases for a period of 18 months. For example, branch 2.3.x is no longer considered maintained as of September 2019, 18 months after the release of 2.3.0 in February 2018. No more 2.3.x releases should be expected after that
379:. A standalone native Spark cluster can be launched manually or by the launch scripts provided by the install package. It is also possible to run the daemons on a single machine for testing. For distributed storage Spark can interface with a wide variety of distributed systems, including 2384:
Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking Streaming Computation Engines: Storm, Flink and Spark Streaming".
1077:. It ingests data in mini-batches and performs RDD transformations on those mini-batches of data. This design enables the same set of application code written for batch analytics to be used in streaming analytics, thus facilitating easy implementation of 479:; fault-tolerance is achieved by keeping track of the "lineage" of each RDD (the sequence of operations that produced it) so that it can be reconstructed in the case of data loss. RDDs can contain any type of Python, .NET, Java, or Scala objects. 1302:
abstraction, and a more general MapReduce-style API. Unlike its predecessor Bagel, which was formally deprecated in Spark 1.6, GraphX has full support for property graphs (graphs where properties can be attached to edges and vertices).
1289:
framework on top of Apache Spark. Because it is based on RDDs, which are immutable, graphs are immutable and thus GraphX is unsuitable for graphs that need to be updated, let alone in a transactional manner like a
2602:
Pregel and its little sibling aggregateMessages() are the cornerstones of graph processing in GraphX. ... algorithms that require more flexibility for the terminating condition have to be implemented using
497:
A typical example of RDD-centric functional programming is the following Scala program that computes the frequencies of all words occurring in a set of text files and prints the most common ones. Each
2012: 1142:
machine-learning framework on top of Spark Core that, due in large part to the distributed memory-based Spark architecture, is as much as nine times as fast as the disk-based implementation used by
1081:. However, this convenience comes with the penalty of latency equal to the mini-batch duration. Other streaming data engines that process event by event rather than in mini-batches include 1306:
Like Apache Spark, GraphX initially started as a research project at UC Berkeley's AMPLab and Databricks, and was later donated to the Apache Software Foundation and the Spark project.
2201: 1850: 467:
or reduce on an RDD by passing a function to Spark, which then schedules the function's execution in parallel on the cluster. These operations, and additional ones such as
842:
server. Although DataFrames lack the compile-time type-checking afforded by RDDs, as of Spark 2.0, the strongly typed DataSet is fully supported by Spark SQL as well.
2351: 1351:
Spark had in excess of 1000 contributors in 2015, making it one of the most active projects in the Apache Software Foundation and one of the most active open source
2121: 3943: 1116:
In Spark 2.x, a separate technology based on Datasets, called Structured Streaming, that has a higher-level interface is also provided to support streaming.
3027: 1158:. Many common machine learning and statistical algorithms have been implemented and are shipped with MLlib which simplifies large scale machine learning 4657: 2226:
Wang, Yandong; Goldstone, Robin; Yu, Weikuan; Wang, Teng (May 2014). "Characterization and Optimization of Memory-Resident MapReduce on HPC Systems".
2062:
Zaharia, Matei; Chowdhury, Mosharaf; Das, Tathagata; Dave, Ankur; Ma, Justin; McCauley, Murphy; J., Michael; Shenker, Scott; Stoica, Ion (2010).
4033: 1987: 451:(the Java API is available for other JVM languages, but is also usable for some other non-JVM languages that can connect to the JVM, such as 287:
way. The Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API. In Spark 1.x, the RDD was the primary
4642: 3885: 1820:
Spark 3.5.2 is based on Scala 2.13 (and thus works with Scala 2.12 and 2.13 out-of-the-box), but it can also be made to work with Scala 3.
1914: 517:
that performs a simple operation on a single data item (or a pair of items), and applies its argument to transform an RDD into a new RDD.
4682: 2482: 2749: 2427: 4677: 4662: 3680: 2806: 806:
is a component on top of Spark Core that introduced a data abstraction called DataFrames, which provides support for structured and
3020: 2616: 2558: 1314:
Apache Spark has built-in support for Scala, Java, SQL, R, and Python with 3rd party support for the .NET CLR, Julia, and more.
4014: 2846: 2402: 2243: 1828:
Apache Spark is developed by a community. The project is managed by a group called the "Project Management Committee" (PMC).
4054: 2359: 4281: 4304: 3851: 3013: 2700: 2326: 4193: 2772: 2724: 2131: 4049: 4299: 4276: 2595: 423:
Spark Core is the foundation of the overall project. It provides distributed task dispatching, scheduling, and basic
340:, which visit their data set multiple times in a loop, and interactive/exploratory data analysis, i.e., the repeated 288: 244: 3878: 2458: 1259: 482:
Besides the RDD-oriented functional style of programming, Spark provides two restricted forms of shared variables:
188: 2643: 2153: 4271: 4086: 2083: 2507: 1338: 4378: 4292: 4241: 3036: 448: 181: 177: 4602: 4436: 4287: 3974: 3623: 823: 432: 169: 2063: 384: 280: 2642:
Gonzalez, Joseph; Xin, Reynold; Dave, Ankur; Crankshaw, Daniel; Franklin, Michael; Stoica, Ion (Oct 2014).
1961: 1249: 1245: 815: 452: 436: 315: 157: 127: 110: 87: 4621: 4567: 4027: 3871: 1269: 1235: 819: 428: 161: 80: 4652: 4647: 4546: 4341: 4226: 4188: 4038: 3928: 2082:
Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion (June 2013).
1286: 1265: 1184: 1170: 839: 835: 268: 1299: 279:
Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only
4562: 4541: 4486: 4373: 4363: 4336: 4198: 464: 4516: 4142: 4081: 3994: 3628: 1241: 811: 460: 444: 284: 173: 2773:"The Apache Software Foundation Announces Apache&#8482 Spark&#8482 as a Top-Level Project" 2725:"The Apache Software Foundation Announces Apache&#8482 Spark&#8482 as a Top-Level Project" 1106: 352:
MapReduce implementation. Among the class of iterative algorithms are the training algorithms for
4577: 4572: 4431: 4022: 3102: 1294:. GraphX provides two separate APIs for implementation of massively parallel algorithms (such as 1221: 1192: 1082: 1074: 1894:
MLlib in R: SparkR now offers MLlib APIs Python: PySpark now offers many more MLlib algorithms"
4672: 4316: 4248: 4152: 4044: 3999: 3643: 1333:
In 2013, the project was donated to the Apache Software Foundation and switched its license to
1204: 831: 491: 456: 330: 311: 291:(API), but as of Spark 2.x use of the Dataset API is encouraged even though the RDD API is not 4408: 4368: 4321: 4311: 4106: 3969: 3908: 3567: 2435: 2312: 1988:"A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets: When to use them and why" 1889: 1139: 364: 345: 2188: 2065:
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
396: 4348: 4236: 4231: 4221: 4208: 4004: 2617:"Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database" 2559:"Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database" 2102: 1120: 807: 408: 303: 240: 2914: 2896: 2878: 2860: 2820: 2798: 8: 4511: 4466: 4266: 4132: 3507: 1273: 1196: 1188: 1174: 1159: 1078: 337: 2999: 2387:
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
2372:
re-use the same aggregates we wrote for our batch application on a real-time data stream
2178:
Figure showing Spark in relation to other open-source Software projects including Hadoop
2106: 1913:
Zaharia, Matei; Chowdhury, Mosharaf; Franklin, Michael J.; Shenker, Scott; Stoica, Ion.
1063://val countsByAge = spark.sql("SELECT age, count(*) FROM people GROUP BY age") 388: 318:
the results of the map, and store reduction results on disk. Spark's RDDs function as a
227: 4536: 4385: 4358: 4183: 4147: 4137: 4096: 3938: 3918: 3913: 3894: 3097: 2932: 2408: 2249: 2092: 2025:
virtually all Spark code you run, where DataFrames or Datasets, compiles down to an RDD
1949:
we highly recommend you to switch to use Dataset, which has better performance than RDD
1933: 1255: 1231: 1178: 1166: 514: 207: 26: 2037: 59: 4582: 4258: 4216: 4111: 3522: 3412: 3297: 3162: 3147: 3127: 2591: 2398: 2239: 2020: 1216: 1200: 1151: 1147: 886:"jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword" 322:
for distributed programs that offers a (deliberately) restricted form of distributed
139: 2838: 2253: 4592: 4391: 4326: 4173: 3989: 3984: 3979: 3948: 3731: 3605: 3562: 3552: 3252: 3212: 3197: 3152: 2957: 2412: 2390: 2231: 1227: 472: 392: 353: 248: 202: 194: 134: 2971: 463:
model of programming: a "driver" program invokes parallel operations such as map,
116: 4667: 4456: 4396: 4331: 4178: 4168: 4101: 4091: 3933: 3923: 3766: 3761: 3741: 3597: 3577: 3537: 3532: 3527: 3512: 3467: 3242: 3132: 3062: 3057: 3052: 3005: 2585: 2327:"Applying the Lambda Architecture with Spark, Kafka, and Cassandra | Pluralsight" 1646: 1127: 476: 468: 360: 310:
structure on distributed programs: MapReduce programs read input data from disk,
252: 1098: 609:// Read files from "somedir" into an RDD of (filename, content) pairs. 4587: 4403: 4060: 3953: 3832: 3806: 3801: 3756: 3716: 3659: 3633: 3615: 3432: 3427: 3407: 3402: 3397: 3357: 3282: 3177: 3172: 3157: 3137: 3067: 2662: 1334: 1291: 1150:(ALS) implementations, and before Mahout itself gained a Spark interface), and 1110: 348:
of such applications may be reduced by several orders of magnitude compared to
333:(DAG). Nodes represent RDDs while edges represent the operations on the RDDs. 283:
of data items distributed over a cluster of machines, that is maintained in a
4636: 4476: 4353: 3791: 3746: 3721: 3592: 3582: 3557: 3542: 3517: 3462: 3422: 3362: 3337: 3332: 3312: 3292: 3287: 3262: 3247: 3182: 3167: 3077: 3072: 1697: 1680: 1663: 1629: 1612: 1595: 1578: 1561: 1544: 1527: 1510: 1493: 1476: 1459: 1442: 1425: 1408: 1391: 1374: 1323: 1212: 1208: 1155: 1143: 424: 368: 349: 323: 39: 2394: 298:
Spark and its RDDs were developed in 2012 in response to limitations in the
243:
unified analytics engine for large-scale data processing. Spark provides an
4076: 3786: 3771: 3726: 3675: 3638: 3587: 3502: 3487: 3482: 3477: 3472: 3452: 3447: 3392: 3387: 3377: 3342: 3327: 3317: 3302: 3272: 3267: 3232: 3227: 3217: 3207: 3202: 3192: 3142: 3117: 3092: 3087: 2177: 2154:"Cluster Mode Overview - Spark 2.4.0 Documentation - Cluster Manager Types" 1327: 1094: 1090: 1086: 427:
functionalities, exposed through an application programming interface (for
372: 2459:"Structured Streaming In Apache Spark: A new high-level API for streaming" 2235: 2228:
2014 IEEE 28th International Parallel and Distributed Processing Symposium
4597: 3811: 3751: 3706: 3547: 3492: 3457: 3367: 3347: 3322: 3307: 3277: 3257: 3222: 3122: 3112: 3107: 2760: 1123: 404: 319: 256: 1073:
Spark Streaming uses Spark Core's fast scheduling capability to perform
486:
reference read-only data that needs to be available on all nodes, while
3816: 3776: 3736: 3685: 3442: 3437: 3417: 3237: 3187: 3082: 1345: 705:// Add a count of one to each token, then sum the counts per word type. 376: 356:
systems, which formed the initial impetus for developing Apache Spark.
292: 46: 34: 4471: 4446: 3863: 400: 299: 2434:. Sigmoid (Sunnyvale, California IT product company). Archived from 4521: 4501: 4426: 3372: 3352: 2676: 2292: 1352: 1295: 412: 341: 307: 264: 2533: 2270: 2097: 1752: 1733: 1714: 1326:
at UC Berkeley's AMPLab in 2009, and open sourced in 2010 under a
1146:(according to benchmarks done by the MLlib developers against the 367:. For cluster management, Spark supports standalone native Spark, 4526: 4506: 4481: 4116: 380: 4496: 4491: 2681: 2508:"Spark Meetup: MLbase, Distributed Machine Learning with Spark" 2297: 2126: 1102: 260: 1922:. USENIX Workshop on Hot Topics in Cloud Computing (HotCloud). 792:// Get the top 10 words. Swap word and count to sort by count. 3796: 3711: 3690: 3382: 147: 143: 2993: 2645:
GraphX: Graph Processing in a Distributed Dataflow Framework
2383: 218: 4531: 4461: 4451: 3781: 3572: 2071:. USENIX Symp. Networked Systems Design and Implementation. 1912: 1851:
List of concurrent and parallel programming APIs/Frameworks
1348:
set a new world record in large scale sorting using Spark.
827: 440: 1799: 4441: 4418: 2750:
Spark officially sets a new record in large-scale sorting
2038:"What is Apache Spark? Spark Tutorial Guide for Beginner" 1787: 1782: 803: 165: 2641: 2293:"GitHub - DFDX/Spark.jl: Julia binding for Apache Spark" 2122:"4 reasons why Spark could jolt Hadoop into hyperdrive" 2081: 2061: 1089:. Spark Streaming has support built-in to consume from 2483:"On-Premises vs. Cloud Data Warehouses: Pros and Cons" 295:. The RDD technology still underlies the Dataset API. 16:
Open-source data analytics cluster computing framework
1344:
In November 2014, Spark founder M. Zaharia's company
1224:
techniques including alternating least squares (ALS)
471:, take RDDs as input and produce new RDDs. RDDs are 2352:"Building Lambda Architecture with Spark Streaming" 2225: 3035: 2701:"Apache Spark speeds up big data decision-making" 2663:".NET for Apache Spark | Big data analytics" 1133: 651:// Split each file into a list of tokens (words). 329:Inside Apache Spark the workflow is managed as a 4634: 1060://df.createOrReplaceTempView("people") 2505: 1831: 2779:. Apache Software Foundation. 27 February 2014 2731:. Apache Software Foundation. 27 February 2014 2514:. Spark User Meetup, San Francisco, California 830:. It also provides SQL language support, with 3879: 3021: 2506:Sparks, Evan; Talwalkar, Ameet (2013-08-06). 1792: 336:Spark facilitates the implementation of both 3886: 3872: 3028: 3014: 2998: 1916:Spark: Cluster Computing with Working Sets 226: 25: 4658:Data mining and machine learning software 2425: 2313:"Spark Release 1.3.0 | Apache Spark" 2096: 1018:// Looks at the schema of this DataFrame. 2010: 1804: 490:can be used to program reductions in an 2456: 2349: 1119:Spark can be deployed in a traditional 247:for programming clusters with implicit 4635: 3893: 2119: 2085:Shark: SQL and Rich Analytics at Scale 1908: 1906: 1904: 1902: 3867: 3009: 2694: 2692: 2614: 2583: 2556: 2265: 2263: 1985: 385:Hadoop Distributed File System (HDFS) 2199: 4643:Apache Software Foundation projects 2761:Open HUB Spark development activity 2428:"Getting Data into Spark Streaming" 1899: 1337:. In February 2014, Spark became a 1309: 1242:dimensionality reduction techniques 306:, which forces a particular linear 13: 4683:University of California, Berkeley 2689: 2426:Kharbanda, Arush (17 March 2015). 2260: 1869:Called SchemaRDDs before Spark 1.3 1699:Old version, no longer maintained: 1682:Old version, no longer maintained: 1665:Old version, no longer maintained: 1648:Old version, no longer maintained: 1631:Old version, no longer maintained: 1614:Old version, no longer maintained: 1597:Old version, no longer maintained: 1580:Old version, no longer maintained: 1563:Old version, no longer maintained: 1546:Old version, no longer maintained: 1529:Old version, no longer maintained: 1512:Old version, no longer maintained: 1495:Old version, no longer maintained: 1478:Old version, no longer maintained: 1461:Old version, no longer maintained: 1444:Old version, no longer maintained: 1427:Old version, no longer maintained: 1410:Old version, no longer maintained: 1393:Old version, no longer maintained: 1376:Old version, no longer maintained: 1068: 814:(DSL) to manipulate DataFrames in 257:University of California, Berkeley 14: 4694: 4678:Software using the Apache license 4663:Free software programmed in Scala 2985: 2698: 2202:"Re: cassandra + spark / pyspark" 289:application programming interface 271:, which has maintained it since. 4616: 4615: 3846: 3845: 2849:from the original on 2022-06-18. 2809:from the original on 2021-08-25. 2350:Shapira, Gwen (29 August 2014). 2120:Harris, Derrick (28 June 2014). 1815: 919:// Create a Spark session object 889:// URL for your database server. 4087:Analysis of parallel algorithms 2964: 2950: 2925: 2907: 2889: 2871: 2853: 2831: 2813: 2791: 2765: 2754: 2743: 2717: 2669: 2655: 2635: 2615:Malak, Michael (14 June 2016). 2608: 2577: 2557:Malak, Michael (14 June 2016). 2550: 2526: 2499: 2475: 2450: 2419: 2377: 2343: 2319: 2305: 2285: 2219: 2193: 2182: 2171: 2160:. Apache Foundation. 2019-07-09 2146: 2113: 1863: 1322:Spark was initially started by 1085:and the streaming component of 552:// create a spark config object 90:2.13) / August 10, 2024 3037:The Apache Software Foundation 2958:"Apache Committee Information" 2584:Malak, Michael (1 July 2016). 2075: 2055: 2030: 2004: 1979: 1962:"Spark 2.2.0 deprecation list" 1954: 1926: 1882: 1134:MLlib Machine Learning Library 255:. Originally developed at the 1: 4034:Simultaneous and heterogenous 2457:Zaharia, Matei (2016-07-28). 2189:MapR ecosystem support matrix 2011:Chambers, Bill (2017-08-10). 1876: 1841:expected even for bug fixes. 1823: 1788:Old version, still maintained 418: 344:-style querying of data. The 4622:Category: Parallel computing 2389:. IEEE. pp. 1789–1792. 1832:Maintenance releases and EOL 1250:principal component analysis 1246:singular value decomposition 797: 603:"/path/to/somedir" 455:). This interface mirrors a 314:a function across the data, 7: 2275:, .NET Platform, 2020-09-14 2200:Doan, DuyHai (2014-09-10). 2017:Spark: The Definitive Guide 1986:Damji, Jules (2016-07-14). 1844: 1837:point, even for bug fixes. 1783:Old version, not maintained 1270:stochastic gradient descent 1236:latent Dirichlet allocation 1057://or alternatively via SQL: 274: 10: 4699: 3929:High-performance computing 2933:"Using Scala 3 with Spark" 2358:. Cloudera. Archived from 2230:. IEEE. pp. 799–808. 1317: 1205:naive Bayes classification 389:MapR File System (MapR-FS) 365:distributed storage system 269:Apache Software Foundation 4611: 4563:Automatic parallelization 4555: 4417: 4257: 4207: 4199:Application checkpointing 4161: 4125: 4069: 4013: 3962: 3901: 3841: 3825: 3699: 3668: 3652: 3614: 3043: 1934:"Spark 2.2.0 Quick Start" 1771: 1280: 1148:alternating least squares 579:// Create a spark context 475:and their operations are 267:was later donated to the 213: 201: 187: 153: 133: 123: 109: 105: 79: 75: 55: 45: 33: 24: 1856: 1339:Top-Level Apache Project 1285:GraphX is a distributed 1228:cluster analysis methods 1181:, random data generation 844: 812:domain-specific language 519: 359:Apache Spark requires a 4578:Embarrassingly parallel 4573:Deterministic algorithm 2590:. Manning. p. 89. 2395:10.1109/IPDPSW.2016.138 1754:Current stable version: 1735:Current stable version: 1716:Current stable version: 1222:collaborative filtering 1193:support vector machines 1054:// Counts people by age 832:command-line interfaces 810:. Spark SQL provides a 4293:Associative processing 4249:Non-blocking algorithm 4055:Clustered multi-thread 2915:"Spark 3.5.2 released" 2897:"Spark 3.4.3 released" 2879:"Spark 3.3.3 released" 2861:"Spark 3.2.4 released" 2839:"Spark 3.1.3 released" 2821:"Spark 3.0.3 released" 2799:"Spark 2.4.8 released" 2587:Spark GraphX in Action 2534:"MLlib | Apache Spark" 1800:Latest preview version 1364:Original release date 447:) centered on the RDD 331:directed acyclic graph 62:; 10 years ago 4409:Hardware acceleration 4322:Superscalar processor 4312:Dataflow architecture 3909:Distributed computing 2236:10.1109/IPDPS.2014.87 1890:"Spark Release 2.0.0" 1217:Gradient-Boosted Tree 546:"wiki_test" 92:; 41 days ago 4288:Pipelined processing 4237:Explicit parallelism 4232:Implicit parallelism 4222:Dataflow programming 2487:SearchDataManagement 808:semi-structured data 338:iterative algorithms 4512:Parallel Extensions 4317:Pipelined processor 2972:"Versioning policy" 2603:aggregateMessages() 2331:www.pluralsight.com 2107:2012arXiv1211.6176X 2042:janbasktraining.com 1274:limited-memory BFGS 1268:algorithms such as 1197:logistic regression 1175:stratified sampling 1079:lambda architecture 1075:streaming analytics 985:"dbtable" 484:broadcast variables 21: 4386:Massively parallel 4364:distributed shared 4184:Cache invalidation 4148:Instruction window 3939:Manycore processor 3919:Massively parallel 3914:Parallel computing 3895:Parallel computing 3098:Apache HTTP Server 2705:ComputerWeekly.com 2685:. 14 October 2021. 2665:. 15 October 2019. 2134:on 24 October 2017 1256:feature extraction 1179:hypothesis testing 1167:summary statistics 1126:as well as in the 991:"people" 515:anonymous function 409:Lustre file system 302:cluster computing 208:Apache License 2.0 35:Original author(s) 19: 4653:Cluster computing 4648:Big data products 4630: 4629: 4583:Parallel slowdown 4217:Stream processing 4107:Karp–Flatt metric 3861: 3860: 2623:. sparksummit.org 2565:. sparksummit.org 2438:on 15 August 2016 2404:978-1-5090-3682-0 2245:978-1-4799-3800-1 1813: 1812: 1809: 1201:linear regression 1138:Spark MLlib is a 234: 233: 140:Microsoft Windows 60:May 26, 2014 4690: 4619: 4618: 4593:Software lockout 4392:Computer cluster 4327:Vector processor 4282:Array processing 4267:Flynn's taxonomy 4174:Memory coherence 3949:Computer network 3888: 3881: 3874: 3865: 3864: 3849: 3848: 3030: 3023: 3016: 3007: 3006: 3002: 2997: 2996: 2994:Official website 2980: 2979: 2976:spark.apache.org 2968: 2962: 2961: 2954: 2948: 2947: 2945: 2943: 2929: 2923: 2922: 2919:spark.apache.org 2911: 2905: 2904: 2901:spark.apache.org 2893: 2887: 2886: 2883:spark.apache.org 2875: 2869: 2868: 2865:spark.apache.org 2857: 2851: 2850: 2843:spark.apache.org 2835: 2829: 2828: 2825:spark.apache.org 2817: 2811: 2810: 2803:spark.apache.org 2795: 2789: 2788: 2786: 2784: 2769: 2763: 2758: 2752: 2747: 2741: 2740: 2738: 2736: 2721: 2715: 2714: 2712: 2711: 2699:Clark, Lindsay. 2696: 2687: 2686: 2673: 2667: 2666: 2659: 2653: 2652: 2650: 2639: 2633: 2632: 2630: 2628: 2612: 2606: 2605: 2581: 2575: 2574: 2572: 2570: 2554: 2548: 2547: 2545: 2544: 2538:spark.apache.org 2530: 2524: 2523: 2521: 2519: 2503: 2497: 2496: 2494: 2493: 2479: 2473: 2472: 2470: 2469: 2454: 2448: 2447: 2445: 2443: 2423: 2417: 2416: 2381: 2375: 2374: 2369: 2367: 2347: 2341: 2340: 2338: 2337: 2323: 2317: 2316: 2309: 2303: 2302: 2289: 2283: 2282: 2281: 2280: 2267: 2258: 2257: 2223: 2217: 2216: 2214: 2213: 2197: 2191: 2186: 2180: 2175: 2169: 2168: 2166: 2165: 2150: 2144: 2143: 2141: 2139: 2130:. Archived from 2117: 2111: 2110: 2100: 2090: 2079: 2073: 2072: 2070: 2059: 2053: 2052: 2050: 2049: 2034: 2028: 2027: 2008: 2002: 2001: 1999: 1998: 1983: 1977: 1976: 1974: 1973: 1958: 1952: 1951: 1946: 1945: 1930: 1924: 1923: 1921: 1910: 1897: 1896: 1886: 1870: 1867: 1806: 1801: 1796: 1789: 1784: 1779: 1772: 1755: 1736: 1717: 1700: 1683: 1666: 1649: 1632: 1615: 1598: 1581: 1564: 1547: 1530: 1513: 1496: 1479: 1462: 1445: 1428: 1411: 1394: 1377: 1358: 1357: 1310:Language support 1287:graph-processing 1064: 1061: 1058: 1055: 1052: 1049: 1046: 1043: 1040: 1037: 1034: 1031: 1028: 1025: 1022: 1019: 1016: 1013: 1010: 1007: 1004: 1001: 998: 995: 992: 989: 986: 983: 980: 977: 974: 971: 968: 965: 962: 959: 956: 953: 950: 949:"jdbc" 947: 944: 941: 938: 935: 932: 929: 926: 923: 920: 917: 914: 911: 908: 905: 902: 899: 896: 893: 890: 887: 884: 881: 878: 875: 872: 869: 866: 863: 860: 857: 854: 851: 848: 793: 790: 787: 784: 781: 778: 775: 772: 769: 766: 763: 760: 757: 754: 751: 748: 745: 742: 739: 736: 733: 730: 727: 724: 721: 718: 715: 712: 709: 706: 703: 700: 697: 694: 691: 688: 685: 682: 679: 676: 673: 670: 667: 664: 661: 658: 655: 652: 649: 646: 643: 640: 637: 634: 631: 628: 625: 622: 619: 616: 613: 610: 607: 604: 601: 598: 595: 592: 589: 586: 583: 580: 577: 574: 571: 568: 565: 562: 559: 556: 553: 550: 547: 544: 541: 538: 535: 532: 529: 526: 523: 512: 508: 504: 500: 354:machine learning 249:data parallelism 230: 225: 222: 220: 195:machine learning 193:Data analytics, 135:Operating system 119: 117:Spark Repository 100: 98: 93: 70: 68: 63: 29: 22: 18: 4698: 4697: 4693: 4692: 4691: 4689: 4688: 4687: 4633: 4632: 4631: 4626: 4607: 4551: 4457:Coarray Fortran 4413: 4397:Beowulf cluster 4253: 4203: 4194:Synchronization 4179:Cache coherence 4169:Multiprocessing 4157: 4121: 4102:Cost efficiency 4097:Gustafson's law 4065: 4009: 3958: 3934:Multiprocessing 3924:Cloud computing 3897: 3892: 3862: 3857: 3837: 3821: 3695: 3664: 3648: 3610: 3045: 3039: 3034: 2992: 2991: 2988: 2983: 2970: 2969: 2965: 2956: 2955: 2951: 2941: 2939: 2931: 2930: 2926: 2913: 2912: 2908: 2895: 2894: 2890: 2877: 2876: 2872: 2859: 2858: 2854: 2837: 2836: 2832: 2819: 2818: 2814: 2797: 2796: 2792: 2782: 2780: 2771: 2770: 2766: 2759: 2755: 2748: 2744: 2734: 2732: 2723: 2722: 2718: 2709: 2707: 2697: 2690: 2675: 2674: 2670: 2661: 2660: 2656: 2648: 2640: 2636: 2626: 2624: 2613: 2609: 2598: 2582: 2578: 2568: 2566: 2555: 2551: 2542: 2540: 2532: 2531: 2527: 2517: 2515: 2504: 2500: 2491: 2489: 2481: 2480: 2476: 2467: 2465: 2455: 2451: 2441: 2439: 2424: 2420: 2405: 2382: 2378: 2365: 2363: 2362:on 14 June 2016 2348: 2344: 2335: 2333: 2325: 2324: 2320: 2311: 2310: 2306: 2291: 2290: 2286: 2278: 2276: 2269: 2268: 2261: 2246: 2224: 2220: 2211: 2209: 2198: 2194: 2187: 2183: 2176: 2172: 2163: 2161: 2152: 2151: 2147: 2137: 2135: 2118: 2114: 2091:. SIGMOD 2013. 2088: 2080: 2076: 2068: 2060: 2056: 2047: 2045: 2036: 2035: 2031: 2009: 2005: 1996: 1994: 1984: 1980: 1971: 1969: 1960: 1959: 1955: 1943: 1941: 1932: 1931: 1927: 1919: 1911: 1900: 1888: 1887: 1883: 1879: 1874: 1873: 1868: 1864: 1859: 1847: 1834: 1826: 1818: 1808: 1807: 1802: 1797: 1790: 1785: 1780: 1775: 1753: 1734: 1715: 1698: 1681: 1664: 1647: 1630: 1613: 1596: 1579: 1562: 1545: 1528: 1511: 1494: 1477: 1460: 1443: 1426: 1409: 1392: 1375: 1367:Latest version 1320: 1312: 1283: 1136: 1071: 1069:Spark Streaming 1066: 1065: 1062: 1059: 1056: 1053: 1050: 1047: 1044: 1042:"age" 1041: 1038: 1035: 1032: 1029: 1026: 1023: 1020: 1017: 1014: 1011: 1008: 1005: 1002: 999: 996: 993: 990: 987: 984: 981: 978: 975: 972: 969: 966: 964:"url" 963: 960: 957: 954: 951: 948: 945: 942: 939: 936: 933: 930: 927: 924: 921: 918: 915: 912: 909: 906: 903: 900: 897: 894: 891: 888: 885: 882: 879: 876: 873: 870: 867: 864: 861: 858: 855: 852: 849: 846: 800: 795: 794: 791: 788: 785: 782: 779: 776: 773: 770: 767: 764: 761: 758: 755: 752: 749: 746: 743: 740: 737: 734: 731: 728: 725: 722: 719: 716: 713: 710: 707: 704: 701: 698: 695: 692: 689: 686: 683: 680: 677: 674: 671: 668: 665: 662: 659: 656: 653: 650: 647: 644: 641: 638: 635: 632: 629: 626: 623: 620: 617: 614: 611: 608: 605: 602: 599: 596: 593: 590: 587: 584: 581: 578: 575: 572: 569: 566: 563: 560: 557: 554: 551: 548: 545: 542: 539: 536: 533: 530: 527: 524: 521: 510: 506: 502: 498: 421: 397:OpenStack Swift 361:cluster manager 277: 253:fault tolerance 217: 115: 101: 96: 94: 91: 66: 64: 61: 56:Initial release 17: 12: 11: 5: 4696: 4686: 4685: 4680: 4675: 4670: 4665: 4660: 4655: 4650: 4645: 4628: 4627: 4625: 4624: 4612: 4609: 4608: 4606: 4605: 4600: 4595: 4590: 4588:Race condition 4585: 4580: 4575: 4570: 4565: 4559: 4557: 4553: 4552: 4550: 4549: 4544: 4539: 4534: 4529: 4524: 4519: 4514: 4509: 4504: 4499: 4494: 4489: 4484: 4479: 4474: 4469: 4464: 4459: 4454: 4449: 4444: 4439: 4434: 4429: 4423: 4421: 4415: 4414: 4412: 4411: 4406: 4401: 4400: 4399: 4389: 4383: 4382: 4381: 4376: 4371: 4366: 4361: 4356: 4346: 4345: 4344: 4339: 4332:Multiprocessor 4329: 4324: 4319: 4314: 4309: 4308: 4307: 4302: 4297: 4296: 4295: 4290: 4285: 4274: 4263: 4261: 4255: 4254: 4252: 4251: 4246: 4245: 4244: 4239: 4234: 4224: 4219: 4213: 4211: 4205: 4204: 4202: 4201: 4196: 4191: 4186: 4181: 4176: 4171: 4165: 4163: 4159: 4158: 4156: 4155: 4150: 4145: 4140: 4135: 4129: 4127: 4123: 4122: 4120: 4119: 4114: 4109: 4104: 4099: 4094: 4089: 4084: 4079: 4073: 4071: 4067: 4066: 4064: 4063: 4061:Hardware scout 4058: 4052: 4047: 4042: 4036: 4031: 4025: 4019: 4017: 4015:Multithreading 4011: 4010: 4008: 4007: 4002: 3997: 3992: 3987: 3982: 3977: 3972: 3966: 3964: 3960: 3959: 3957: 3956: 3954:Systolic array 3951: 3946: 3941: 3936: 3931: 3926: 3921: 3916: 3911: 3905: 3903: 3899: 3898: 3891: 3890: 3883: 3876: 3868: 3859: 3858: 3856: 3855: 3842: 3839: 3838: 3836: 3835: 3833:Apache License 3829: 3827: 3823: 3822: 3820: 3819: 3814: 3809: 3804: 3799: 3794: 3789: 3784: 3779: 3774: 3769: 3764: 3759: 3754: 3749: 3744: 3739: 3734: 3729: 3724: 3719: 3714: 3709: 3703: 3701: 3697: 3696: 3694: 3693: 3688: 3683: 3678: 3672: 3670: 3669:Other projects 3666: 3665: 3663: 3662: 3656: 3654: 3650: 3649: 3647: 3646: 3641: 3636: 3631: 3626: 3620: 3618: 3612: 3611: 3609: 3608: 3603: 3600: 3595: 3590: 3585: 3580: 3575: 3570: 3568:Traffic Server 3565: 3560: 3555: 3550: 3545: 3540: 3535: 3530: 3525: 3520: 3515: 3510: 3505: 3500: 3495: 3490: 3485: 3480: 3475: 3470: 3465: 3460: 3455: 3450: 3445: 3440: 3435: 3430: 3425: 3420: 3415: 3410: 3405: 3400: 3395: 3390: 3385: 3380: 3375: 3370: 3365: 3360: 3355: 3350: 3345: 3340: 3335: 3330: 3325: 3320: 3315: 3310: 3305: 3300: 3295: 3290: 3285: 3280: 3275: 3270: 3265: 3260: 3255: 3250: 3245: 3240: 3235: 3230: 3225: 3220: 3215: 3210: 3205: 3200: 3195: 3190: 3185: 3180: 3175: 3170: 3165: 3160: 3155: 3150: 3145: 3140: 3135: 3130: 3125: 3120: 3115: 3110: 3105: 3100: 3095: 3090: 3085: 3080: 3075: 3070: 3065: 3060: 3055: 3049: 3047: 3041: 3040: 3033: 3032: 3025: 3018: 3010: 3004: 3003: 2987: 2986:External links 2984: 2982: 2981: 2963: 2949: 2924: 2906: 2888: 2870: 2852: 2830: 2812: 2790: 2764: 2753: 2742: 2716: 2688: 2668: 2654: 2634: 2621:slideshare.net 2607: 2596: 2576: 2563:slideshare.net 2549: 2525: 2512:slideshare.net 2498: 2474: 2463:databricks.com 2449: 2418: 2403: 2376: 2342: 2318: 2304: 2284: 2259: 2244: 2218: 2208:(Mailing list) 2206:Cassandra User 2192: 2181: 2170: 2145: 2112: 2074: 2054: 2029: 2021:O'Reilly Media 2003: 1992:databricks.com 1978: 1953: 1925: 1898: 1880: 1878: 1875: 1872: 1871: 1861: 1860: 1858: 1855: 1854: 1853: 1846: 1843: 1833: 1830: 1825: 1822: 1817: 1814: 1811: 1810: 1805:Future release 1803: 1798: 1794:Latest version 1791: 1786: 1781: 1774: 1773: 1769: 1768: 1765: 1762: 1759: 1750: 1749: 1746: 1743: 1740: 1731: 1730: 1727: 1724: 1721: 1712: 1711: 1708: 1705: 1702: 1695: 1694: 1691: 1688: 1685: 1678: 1677: 1674: 1671: 1668: 1661: 1660: 1657: 1654: 1651: 1644: 1643: 1640: 1637: 1634: 1627: 1626: 1623: 1620: 1617: 1610: 1609: 1606: 1603: 1600: 1593: 1592: 1589: 1586: 1583: 1576: 1575: 1572: 1569: 1566: 1559: 1558: 1555: 1552: 1549: 1542: 1541: 1538: 1535: 1532: 1525: 1524: 1521: 1518: 1515: 1508: 1507: 1504: 1501: 1498: 1491: 1490: 1487: 1484: 1481: 1474: 1473: 1470: 1467: 1464: 1457: 1456: 1453: 1450: 1447: 1440: 1439: 1436: 1433: 1430: 1423: 1422: 1419: 1416: 1413: 1406: 1405: 1402: 1399: 1396: 1389: 1388: 1385: 1382: 1379: 1372: 1371: 1368: 1365: 1362: 1319: 1316: 1311: 1308: 1292:graph database 1282: 1279: 1278: 1277: 1263: 1260:transformation 1253: 1239: 1225: 1219: 1185:classification 1182: 1135: 1132: 1111:TCP/IP sockets 1070: 1067: 845: 799: 796: 520: 505:(a variant of 420: 417: 285:fault-tolerant 276: 273: 232: 231: 215: 211: 210: 205: 199: 198: 191: 185: 184: 155: 151: 150: 137: 131: 130: 125: 121: 120: 113: 107: 106: 103: 102: 85: 83: 81:Stable release 77: 76: 73: 72: 57: 53: 52: 49: 43: 42: 37: 31: 30: 15: 9: 6: 4: 3: 2: 4695: 4684: 4681: 4679: 4676: 4674: 4673:Java platform 4671: 4669: 4666: 4664: 4661: 4659: 4656: 4654: 4651: 4649: 4646: 4644: 4641: 4640: 4638: 4623: 4614: 4613: 4610: 4604: 4601: 4599: 4596: 4594: 4591: 4589: 4586: 4584: 4581: 4579: 4576: 4574: 4571: 4569: 4566: 4564: 4561: 4560: 4558: 4554: 4548: 4545: 4543: 4540: 4538: 4535: 4533: 4530: 4528: 4525: 4523: 4520: 4518: 4515: 4513: 4510: 4508: 4505: 4503: 4500: 4498: 4495: 4493: 4490: 4488: 4485: 4483: 4480: 4478: 4477:Global Arrays 4475: 4473: 4470: 4468: 4465: 4463: 4460: 4458: 4455: 4453: 4450: 4448: 4445: 4443: 4440: 4438: 4435: 4433: 4430: 4428: 4425: 4424: 4422: 4420: 4416: 4410: 4407: 4405: 4404:Grid computer 4402: 4398: 4395: 4394: 4393: 4390: 4387: 4384: 4380: 4377: 4375: 4372: 4370: 4367: 4365: 4362: 4360: 4357: 4355: 4352: 4351: 4350: 4347: 4343: 4340: 4338: 4335: 4334: 4333: 4330: 4328: 4325: 4323: 4320: 4318: 4315: 4313: 4310: 4306: 4303: 4301: 4298: 4294: 4291: 4289: 4286: 4283: 4280: 4279: 4278: 4275: 4273: 4270: 4269: 4268: 4265: 4264: 4262: 4260: 4256: 4250: 4247: 4243: 4240: 4238: 4235: 4233: 4230: 4229: 4228: 4225: 4223: 4220: 4218: 4215: 4214: 4212: 4210: 4206: 4200: 4197: 4195: 4192: 4190: 4187: 4185: 4182: 4180: 4177: 4175: 4172: 4170: 4167: 4166: 4164: 4160: 4154: 4151: 4149: 4146: 4144: 4141: 4139: 4136: 4134: 4131: 4130: 4128: 4124: 4118: 4115: 4113: 4110: 4108: 4105: 4103: 4100: 4098: 4095: 4093: 4090: 4088: 4085: 4083: 4080: 4078: 4075: 4074: 4072: 4068: 4062: 4059: 4056: 4053: 4051: 4048: 4046: 4043: 4040: 4037: 4035: 4032: 4029: 4026: 4024: 4021: 4020: 4018: 4016: 4012: 4006: 4003: 4001: 3998: 3996: 3993: 3991: 3988: 3986: 3983: 3981: 3978: 3976: 3973: 3971: 3968: 3967: 3965: 3961: 3955: 3952: 3950: 3947: 3945: 3942: 3940: 3937: 3935: 3932: 3930: 3927: 3925: 3922: 3920: 3917: 3915: 3912: 3910: 3907: 3906: 3904: 3900: 3896: 3889: 3884: 3882: 3877: 3875: 3870: 3869: 3866: 3854: 3853: 3844: 3843: 3840: 3834: 3831: 3830: 3828: 3824: 3818: 3815: 3813: 3810: 3808: 3805: 3803: 3800: 3798: 3795: 3793: 3790: 3788: 3785: 3783: 3780: 3778: 3775: 3773: 3770: 3768: 3765: 3763: 3760: 3758: 3755: 3753: 3750: 3748: 3745: 3743: 3740: 3738: 3735: 3733: 3730: 3728: 3725: 3723: 3720: 3718: 3715: 3713: 3710: 3708: 3705: 3704: 3702: 3698: 3692: 3689: 3687: 3684: 3682: 3679: 3677: 3674: 3673: 3671: 3667: 3661: 3658: 3657: 3655: 3651: 3645: 3642: 3640: 3637: 3635: 3632: 3630: 3627: 3625: 3622: 3621: 3619: 3617: 3613: 3607: 3604: 3601: 3599: 3596: 3594: 3591: 3589: 3586: 3584: 3581: 3579: 3576: 3574: 3571: 3569: 3566: 3564: 3561: 3559: 3556: 3554: 3551: 3549: 3546: 3544: 3541: 3539: 3536: 3534: 3531: 3529: 3526: 3524: 3521: 3519: 3516: 3514: 3511: 3509: 3506: 3504: 3501: 3499: 3496: 3494: 3491: 3489: 3486: 3484: 3481: 3479: 3476: 3474: 3471: 3469: 3466: 3464: 3461: 3459: 3456: 3454: 3451: 3449: 3446: 3444: 3441: 3439: 3436: 3434: 3431: 3429: 3426: 3424: 3421: 3419: 3416: 3414: 3411: 3409: 3406: 3404: 3401: 3399: 3396: 3394: 3391: 3389: 3386: 3384: 3381: 3379: 3376: 3374: 3371: 3369: 3366: 3364: 3361: 3359: 3356: 3354: 3351: 3349: 3346: 3344: 3341: 3339: 3336: 3334: 3331: 3329: 3326: 3324: 3321: 3319: 3316: 3314: 3311: 3309: 3306: 3304: 3301: 3299: 3296: 3294: 3291: 3289: 3286: 3284: 3281: 3279: 3276: 3274: 3271: 3269: 3266: 3264: 3261: 3259: 3256: 3254: 3251: 3249: 3246: 3244: 3241: 3239: 3236: 3234: 3231: 3229: 3226: 3224: 3221: 3219: 3216: 3214: 3211: 3209: 3206: 3204: 3201: 3199: 3196: 3194: 3191: 3189: 3186: 3184: 3181: 3179: 3176: 3174: 3171: 3169: 3166: 3164: 3161: 3159: 3156: 3154: 3151: 3149: 3146: 3144: 3141: 3139: 3136: 3134: 3131: 3129: 3126: 3124: 3121: 3119: 3116: 3114: 3111: 3109: 3106: 3104: 3101: 3099: 3096: 3094: 3091: 3089: 3086: 3084: 3081: 3079: 3076: 3074: 3071: 3069: 3066: 3064: 3061: 3059: 3056: 3054: 3051: 3050: 3048: 3042: 3038: 3031: 3026: 3024: 3019: 3017: 3012: 3011: 3008: 3001: 2995: 2990: 2989: 2977: 2973: 2967: 2959: 2953: 2938: 2934: 2928: 2920: 2916: 2910: 2902: 2898: 2892: 2884: 2880: 2874: 2866: 2862: 2856: 2848: 2844: 2840: 2834: 2826: 2822: 2816: 2808: 2804: 2800: 2794: 2778: 2774: 2768: 2762: 2757: 2751: 2746: 2730: 2726: 2720: 2706: 2702: 2695: 2693: 2684: 2683: 2678: 2672: 2664: 2658: 2647: 2646: 2638: 2622: 2618: 2611: 2604: 2599: 2597:9781617292521 2593: 2589: 2588: 2580: 2564: 2560: 2553: 2539: 2535: 2529: 2513: 2509: 2502: 2488: 2484: 2478: 2464: 2460: 2453: 2437: 2433: 2429: 2422: 2414: 2410: 2406: 2400: 2396: 2392: 2388: 2380: 2373: 2361: 2357: 2353: 2346: 2332: 2328: 2322: 2314: 2308: 2301:. 2019-05-24. 2300: 2299: 2294: 2288: 2274: 2273: 2266: 2264: 2255: 2251: 2247: 2241: 2237: 2233: 2229: 2222: 2207: 2203: 2196: 2190: 2185: 2179: 2174: 2159: 2155: 2149: 2133: 2129: 2128: 2123: 2116: 2108: 2104: 2099: 2094: 2087: 2086: 2078: 2067: 2066: 2058: 2043: 2039: 2033: 2026: 2022: 2018: 2014: 2007: 1993: 1989: 1982: 1967: 1963: 1957: 1950: 1939: 1935: 1929: 1918: 1917: 1909: 1907: 1905: 1903: 1895: 1891: 1885: 1881: 1866: 1862: 1852: 1849: 1848: 1842: 1838: 1829: 1821: 1816:Scala Version 1795: 1778: 1770: 1766: 1763: 1760: 1758: 1751: 1747: 1744: 1741: 1739: 1732: 1728: 1725: 1722: 1720: 1713: 1709: 1706: 1703: 1696: 1692: 1689: 1686: 1679: 1675: 1672: 1669: 1662: 1658: 1655: 1652: 1645: 1641: 1638: 1635: 1628: 1624: 1621: 1618: 1611: 1607: 1604: 1601: 1594: 1590: 1587: 1584: 1577: 1573: 1570: 1567: 1560: 1556: 1553: 1550: 1543: 1539: 1536: 1533: 1526: 1522: 1519: 1516: 1509: 1505: 1502: 1499: 1492: 1488: 1485: 1482: 1475: 1471: 1468: 1465: 1458: 1454: 1451: 1448: 1441: 1437: 1434: 1431: 1424: 1420: 1417: 1414: 1407: 1403: 1400: 1397: 1390: 1386: 1383: 1380: 1373: 1370:Release date 1369: 1366: 1363: 1360: 1359: 1356: 1354: 1349: 1347: 1342: 1340: 1336: 1331: 1329: 1325: 1324:Matei Zaharia 1315: 1307: 1304: 1301: 1297: 1293: 1288: 1275: 1271: 1267: 1264: 1261: 1257: 1254: 1251: 1247: 1243: 1240: 1237: 1233: 1229: 1226: 1223: 1220: 1218: 1214: 1213:Random Forest 1210: 1209:Decision Tree 1206: 1202: 1198: 1194: 1190: 1186: 1183: 1180: 1176: 1172: 1168: 1165: 1164: 1163: 1162:, including: 1161: 1157: 1156:Vowpal Wabbit 1153: 1149: 1145: 1144:Apache Mahout 1141: 1131: 1129: 1125: 1122: 1117: 1114: 1112: 1108: 1104: 1100: 1096: 1092: 1088: 1084: 1080: 1076: 843: 841: 837: 833: 829: 825: 821: 817: 813: 809: 805: 645:" " 518: 516: 495: 493: 489: 485: 480: 478: 474: 470: 466: 462: 458: 454: 450: 446: 442: 438: 434: 430: 426: 416: 414: 410: 406: 402: 398: 394: 390: 386: 382: 378: 374: 370: 366: 362: 357: 355: 351: 350:Apache Hadoop 347: 343: 339: 334: 332: 327: 325: 324:shared memory 321: 317: 313: 309: 305: 301: 296: 294: 290: 286: 282: 272: 270: 266: 262: 258: 254: 250: 246: 242: 238: 229: 224: 216: 212: 209: 206: 204: 200: 196: 192: 190: 186: 183: 179: 175: 171: 167: 163: 159: 156: 152: 149: 145: 141: 138: 136: 132: 129: 126: 122: 118: 114: 112: 108: 104: 89: 84: 82: 78: 74: 71: 58: 54: 50: 48: 44: 41: 40:Matei Zaharia 38: 36: 32: 28: 23: 4162:Coordination 4092:Amdahl's law 4028:Simultaneous 3850: 3508:SpamAssassin 3497: 2975: 2966: 2952: 2940:. Retrieved 2936: 2927: 2918: 2909: 2900: 2891: 2882: 2873: 2864: 2855: 2842: 2833: 2824: 2815: 2802: 2793: 2781:. Retrieved 2776: 2767: 2756: 2745: 2733:. Retrieved 2728: 2719: 2708:. Retrieved 2704: 2680: 2671: 2657: 2651:. OSDI 2014. 2644: 2637: 2625:. Retrieved 2620: 2610: 2601: 2586: 2579: 2567:. Retrieved 2562: 2552: 2541:. Retrieved 2537: 2528: 2516:. Retrieved 2511: 2501: 2490:. Retrieved 2486: 2477: 2466:. Retrieved 2462: 2452: 2440:. Retrieved 2436:the original 2431: 2421: 2386: 2379: 2371: 2364:. Retrieved 2360:the original 2356:cloudera.com 2355: 2345: 2334:. Retrieved 2330: 2321: 2307: 2296: 2287: 2277:, retrieved 2272:dotnet/spark 2271: 2227: 2221: 2210:. Retrieved 2205: 2195: 2184: 2173: 2162:. Retrieved 2157: 2148: 2136:. Retrieved 2132:the original 2125: 2115: 2084: 2077: 2064: 2057: 2046:. Retrieved 2044:. 2018-04-13 2041: 2032: 2024: 2016: 2006: 1995:. Retrieved 1991: 1981: 1970:. Retrieved 1968:. 2017-07-11 1965: 1956: 1948: 1942:. Retrieved 1940:. 2017-07-11 1937: 1928: 1915: 1893: 1884: 1865: 1839: 1835: 1827: 1819: 1793: 1776: 1756: 1737: 1718: 1350: 1343: 1332: 1321: 1313: 1305: 1284: 1266:optimization 1171:correlations 1154:better than 1137: 1118: 1115: 1072: 901:SparkSession 874:SparkSession 801: 567:SparkContext 496: 488:accumulators 487: 483: 481: 461:higher-order 422: 373:Apache Mesos 358: 335: 328: 297: 278: 263:, the Spark 237:Apache Spark 236: 235: 154:Available in 51:Apache Spark 47:Developer(s) 20:Apache Spark 4598:Scalability 4359:distributed 4242:Concurrency 4209:Programming 4050:Cooperative 4039:Speculative 3975:Instruction 2518:10 February 2432:sigmoid.com 2138:25 February 1767:2024-08-10 1761:2023-09-09 1748:2024-04-18 1742:2023-04-13 1729:2023-08-21 1723:2022-06-16 1710:2023-04-13 1704:2021-10-13 1693:2022-02-18 1687:2021-03-02 1676:2021-06-01 1670:2020-06-18 1659:2021-05-17 1653:2018-11-02 1642:2019-09-09 1636:2018-02-28 1625:2019-01-11 1619:2017-07-11 1608:2018-06-26 1602:2016-12-28 1591:2016-11-14 1585:2016-07-26 1574:2016-11-07 1568:2016-01-04 1557:2015-11-09 1551:2015-09-09 1540:2015-07-15 1534:2015-06-11 1523:2015-04-17 1517:2015-03-13 1506:2015-04-17 1500:2014-12-18 1489:2014-11-26 1483:2014-09-11 1472:2014-08-05 1466:2014-05-26 1455:2014-07-23 1449:2014-02-02 1438:2013-12-19 1432:2013-09-25 1421:2013-07-16 1415:2013-02-27 1404:2013-02-07 1398:2012-10-15 1387:2012-11-22 1381:2012-06-12 1328:BSD license 1248:(SVD), and 1140:distributed 1124:data center 1121:on-premises 1024:countsByAge 1012:printSchema 913:getOrCreate 687:reduceByKey 511:reduceByKey 449:abstraction 369:Hadoop YARN 320:working set 241:open-source 4637:Categories 4603:Starvation 4342:asymmetric 4077:PRAM model 4045:Preemptive 3737:Deltacloud 3523:Subversion 3413:OŃ€enOffice 3298:Jackrabbit 3238:FreeMarker 3163:CloudStack 3148:CarbonData 3128:Bloodhound 2937:47 Degrees 2777:apache.org 2729:apache.org 2710:2018-05-16 2677:"Spark.jl" 2543:2016-01-18 2492:2022-10-16 2468:2017-10-19 2336:2016-11-20 2279:2020-09-14 2212:2014-11-21 2164:2019-07-09 2158:apache.org 2048:2018-04-13 1997:2017-10-19 1972:2017-10-10 1966:apache.org 1944:2017-10-19 1938:apache.org 1877:References 1824:Developers 1355:projects. 1346:Databricks 1335:Apache 2.0 1230:including 1189:regression 540:setAppName 492:imperative 457:functional 419:Spark Core 377:Kubernetes 293:deprecated 197:algorithms 124:Written in 111:Repository 97:2024-08-10 67:2014-05-26 4337:symmetric 4082:PEM model 3732:Continuum 3653:Incubator 3606:ZooKeeper 3563:Trafodion 3553:TinkerPop 3253:Guacamole 3213:Empire-db 3198:Directory 3153:Cassandra 3044:Top-level 2098:1211.6176 1262:functions 1160:pipelines 798:Spark SQL 534:SparkConf 513:takes an 473:immutable 401:Amazon S3 393:Cassandra 300:MapReduce 245:interface 4568:Deadlock 4556:Problems 4522:pthreads 4502:OpenHMPP 4427:Ateji PX 4388:computer 4259:Hardware 4126:Elements 4112:Slowdown 4023:Temporal 4005:Pipeline 3852:Category 3826:Licenses 3767:Marmotta 3598:XMLBeans 3578:Velocity 3538:Tapestry 3533:SystemDS 3528:Superset 3518:Struts 2 3513:Struts 1 3468:RocketMQ 3373:NetBeans 3353:mod_perl 3243:Geronimo 3133:Brooklyn 3063:Airavata 3058:ActiveMQ 3053:Accumulo 3046:projects 2847:Archived 2807:Archived 2254:11157612 1845:See also 1650:2.4 LTS 1361:Version 1353:big data 1296:PageRank 1276:(L-BFGS) 1244:such as 708:wordFreq 657:wordFreq 597:textFile 413:CPU core 342:database 308:dataflow 304:paradigm 281:multiset 275:Overview 265:codebase 4527:RaftLib 4507:OpenACC 4482:GPUOpen 4472:C++ AMP 4447:Charm++ 4189:Barrier 4133:Process 4117:Speedup 3902:General 3807:Tuscany 3802:Stanbol 3762:Jakarta 3757:Harmony 3717:Beehive 3660:Taverna 3644:Logging 3616:Commons 3433:Phoenix 3428:Parquet 3408:OpenNLP 3403:OpenJPA 3398:OpenEJB 3358:MyFaces 3283:Iceberg 3178:CouchDB 3173:Cordova 3158:Cayenne 3138:Calcite 3068:Airflow 2942:29 July 2783:4 March 2735:4 March 2627:11 July 2569:11 July 2413:2180634 2366:17 June 2103:Bibcode 1777:Legend: 1318:History 1232:k-means 1107:Kinesis 1099:Twitter 1036:groupBy 907:builder 627:flatMap 503:flatMap 494:style. 381:Alluxio 346:latency 221:.apache 214:Website 203:License 95: ( 86:3.5.2 ( 65: ( 4668:Hadoop 4620:  4497:OpenCL 4492:OpenMP 4437:Chapel 4354:shared 4349:Memory 4284:(SIMT) 4227:Models 4138:Thread 4070:Theory 4041:(SpMT) 3995:Memory 3980:Thread 3963:Levels 3747:Giraph 3722:iBATIS 3634:Daemon 3593:Xerces 3583:Wicket 3558:Tomcat 3543:Thrift 3463:Roller 3423:PDFBox 3363:Mynewt 3338:Mahout 3333:Lucene 3313:JMeter 3293:Impala 3288:Ignite 3263:Hadoop 3248:Groovy 3183:cTAKES 3168:Cocoon 3078:Ambari 3073:Allura 2682:GitHub 2594:  2442:7 July 2411:  2401:  2298:GitHub 2252:  2242:  2127:Gigaom 1764:3.5.2 1745:3.4.3 1726:3.3.3 1707:3.2.4 1690:3.1.3 1673:3.0.3 1656:2.4.8 1639:2.3.4 1622:2.2.3 1605:2.1.3 1588:2.0.2 1571:1.6.3 1554:1.5.2 1537:1.4.1 1520:1.3.1 1503:1.2.2 1486:1.1.1 1469:1.0.2 1452:0.9.2 1435:0.8.1 1418:0.7.3 1401:0.6.2 1384:0.5.2 1300:Pregel 1281:GraphX 1234:, and 1152:scales 1109:, and 1103:ZeroMQ 979:option 958:option 943:format 856:apache 847:import 824:Python 802:Spark 714:sortBy 663:tokens 615:tokens 509:) and 465:filter 433:Python 363:and a 316:reduce 261:AMPLab 239:is an 170:Python 4467:Dryad 4432:Boost 4153:Array 4143:Fiber 4057:(CMT) 4030:(SMT) 3944:GPGPU 3797:Sqoop 3792:Slide 3787:Shale 3782:River 3772:MXNet 3727:Click 3712:AxKit 3700:Attic 3691:Log4j 3676:Batik 3639:Jelly 3602:Yetus 3588:Xalan 3503:Storm 3498:Spark 3488:Sling 3483:SINGA 3478:Shiro 3473:Samza 3453:Pivot 3448:Pinot 3393:Oozie 3388:OFBiz 3383:NuttX 3378:Nutch 3343:Maven 3328:Kylin 3318:Kafka 3303:James 3273:Helix 3268:HBase 3233:Flume 3228:Flink 3218:Felix 3208:Druid 3203:Drill 3193:Derby 3143:Camel 3118:Axis2 3093:Arrow 3088:Aries 2649:(PDF) 2409:S2CID 2250:S2CID 2093:arXiv 2089:(PDF) 2069:(PDF) 1920:(PDF) 1857:Notes 1298:): a 1252:(PCA) 1238:(LDA) 1128:cloud 1095:Flume 1091:Kafka 1087:Flink 1083:Storm 1048:count 931:spark 895:spark 862:spark 816:Scala 750:=> 723:=> 639:split 469:joins 453:Julia 437:Scala 219:spark 158:Scala 148:Linux 144:macOS 128:Scala 88:Scala 4532:ROCm 4462:CUDA 4452:Cilk 4419:APIs 4379:COMA 4374:NUMA 4305:MIMD 4300:MISD 4277:SIMD 4272:SISD 4000:Loop 3990:Data 3985:Task 3812:Wave 3752:Hama 3742:Etch 3707:Apex 3624:BCEL 3573:UIMA 3548:Tika 3493:Solr 3458:Qpid 3368:NiFi 3348:MINA 3323:Kudu 3308:Jena 3278:Hive 3258:Gump 3223:Flex 3123:Beam 3113:Axis 3108:Avro 2944:2022 2785:2014 2737:2014 2629:2016 2592:ISBN 2571:2016 2520:2014 2444:2016 2399:ISBN 2368:2016 2240:ISBN 2140:2016 2013:"12" 1701:3.2 1684:3.1 1667:3.0 1633:2.3 1616:2.2 1599:2.1 1582:2.0 1565:1.6 1548:1.5 1531:1.4 1514:1.3 1497:1.2 1480:1.1 1463:1.0 1446:0.9 1429:0.8 1412:0.7 1395:0.6 1378:0.5 1258:and 1187:and 1000:load 937:read 840:JDBC 836:ODBC 834:and 828:.NET 820:Java 621:data 585:data 573:conf 525:conf 477:lazy 443:and 441:.NET 429:Java 405:Kudu 251:and 223:.org 189:Type 162:Java 4547:ZPL 4542:TBB 4537:UPC 4517:PVM 4487:MPI 4442:HPX 4369:UMA 3970:Bit 3817:XML 3777:ODE 3686:Ivy 3681:FOP 3629:BSF 3443:Pig 3438:POI 3418:ORC 3188:CXF 3103:APR 3083:Ant 2391:doi 2232:doi 1757:3.5 1738:3.4 1719:3.3 1021:val 970:url 922:val 910:(). 892:val 880:url 877:val 868:sql 850:org 826:or 804:SQL 780:top 777:)). 741:map 684:)). 669:map 654:val 612:val 582:val 564:new 555:val 537:(). 531:new 522:val 507:map 499:map 425:I/O 375:or 312:map 259:'s 166:SQL 4639:: 2974:. 2935:. 2917:. 2899:. 2881:. 2863:. 2845:. 2841:. 2823:. 2805:. 2801:. 2775:. 2727:. 2703:. 2691:^ 2679:. 2619:. 2600:. 2561:. 2536:. 2510:. 2485:. 2461:. 2430:. 2407:. 2397:. 2370:. 2354:. 2329:. 2295:. 2262:^ 2248:. 2238:. 2204:. 2156:. 2124:. 2101:. 2040:. 2023:. 2019:. 2015:. 1990:. 1964:. 1947:. 1936:. 1901:^ 1892:. 1341:. 1330:. 1272:, 1215:, 1211:, 1207:, 1203:, 1199:, 1195:, 1191:: 1177:, 1173:, 1169:, 1130:. 1113:. 1105:, 1101:, 1097:, 1093:, 1051:() 1045:). 1030:df 1015:() 1006:df 1003:() 925:df 916:() 822:, 818:, 786:10 774:_1 762:_2 738:). 735:_2 672:(( 648:)) 591:sc 558:sc 501:, 439:, 435:, 431:, 415:. 407:, 403:, 399:, 395:, 391:, 387:, 383:, 371:, 326:. 182:F# 180:, 178:C# 176:, 172:, 168:, 164:, 160:, 146:, 142:, 3887:e 3880:t 3873:v 3029:e 3022:t 3015:v 2978:. 2960:. 2946:. 2921:. 2903:. 2885:. 2867:. 2827:. 2787:. 2739:. 2713:. 2631:. 2573:. 2546:. 2522:. 2495:. 2471:. 2446:. 2415:. 2393:: 2339:. 2315:. 2256:. 2234:: 2215:. 2167:. 2142:. 2109:. 2105:: 2095:: 2051:. 2000:. 1975:. 1039:( 1033:. 1027:= 1009:. 997:. 994:) 988:, 982:( 976:. 973:) 967:, 961:( 955:. 952:) 946:( 940:. 934:. 928:= 904:. 898:= 883:= 871:. 865:. 859:. 853:. 838:/ 789:) 783:( 771:. 768:x 765:, 759:. 756:x 753:( 747:x 744:( 732:. 729:s 726:- 720:s 717:( 711:. 702:) 699:_ 696:+ 693:_ 690:( 681:1 678:, 675:_ 666:. 660:= 642:( 636:. 633:_ 630:( 624:. 618:= 606:) 600:( 594:. 588:= 576:) 570:( 561:= 549:) 543:( 528:= 459:/ 445:R 174:R 99:) 69:)

Index

Spark Logo
Original author(s)
Matei Zaharia
Developer(s)
May 26, 2014; 10 years ago (2014-05-26)
Stable release
Scala
Repository
Spark Repository
Scala
Operating system
Microsoft Windows
macOS
Linux
Scala
Java
SQL
Python
R
C#
F#
Type
machine learning
License
Apache License 2.0
spark.apache.org
Edit this at Wikidata
open-source
interface
data parallelism

Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.

↑