3000:
228:
27:
4617:
3847:
411:, or a custom solution can be implemented. Spark also supports a pseudo-distributed local mode, usually used only for development or testing purposes, where distributed storage is not required and the local file system can be used instead; in such a scenario, Spark is run on a single machine with one executor per
1840:
The last minor release within a major a release will typically be maintained for longer as an “LTS” release. For example, 2.4.0 was released on
November 2, 2018, and had been maintained for 31 months until 2.4.8 was released in May 2021. 2.4.8 is the last release and no more 2.4.x releases should be
1836:
Feature release branches will, generally, be maintained with bug fix releases for a period of 18 months. For example, branch 2.3.x is no longer considered maintained as of
September 2019, 18 months after the release of 2.3.0 in February 2018. No more 2.3.x releases should be expected after that
379:. A standalone native Spark cluster can be launched manually or by the launch scripts provided by the install package. It is also possible to run the daemons on a single machine for testing. For distributed storage Spark can interface with a wide variety of distributed systems, including
2384:
Chintapalli, Sanket; Dagit, Derek; Evans, Bobby; Farivar, Reza; Graves, Thomas; Holderbaugh, Mark; Liu, Zhuo; Nusbaum, Kyle; Patil, Kishorkumar; Peng, Boyang Jerry; Poulosky, Paul (May 2016). "Benchmarking
Streaming Computation Engines: Storm, Flink and Spark Streaming".
1077:. It ingests data in mini-batches and performs RDD transformations on those mini-batches of data. This design enables the same set of application code written for batch analytics to be used in streaming analytics, thus facilitating easy implementation of
479:; fault-tolerance is achieved by keeping track of the "lineage" of each RDD (the sequence of operations that produced it) so that it can be reconstructed in the case of data loss. RDDs can contain any type of Python, .NET, Java, or Scala objects.
1302:
abstraction, and a more general MapReduce-style API. Unlike its predecessor Bagel, which was formally deprecated in Spark 1.6, GraphX has full support for property graphs (graphs where properties can be attached to edges and vertices).
1289:
framework on top of Apache Spark. Because it is based on RDDs, which are immutable, graphs are immutable and thus GraphX is unsuitable for graphs that need to be updated, let alone in a transactional manner like a
2602:
Pregel and its little sibling aggregateMessages() are the cornerstones of graph processing in GraphX. ... algorithms that require more flexibility for the terminating condition have to be implemented using
497:
A typical example of RDD-centric functional programming is the following Scala program that computes the frequencies of all words occurring in a set of text files and prints the most common ones. Each
2012:
1142:
machine-learning framework on top of Spark Core that, due in large part to the distributed memory-based Spark architecture, is as much as nine times as fast as the disk-based implementation used by
1081:. However, this convenience comes with the penalty of latency equal to the mini-batch duration. Other streaming data engines that process event by event rather than in mini-batches include
1306:
Like Apache Spark, GraphX initially started as a research project at UC Berkeley's AMPLab and
Databricks, and was later donated to the Apache Software Foundation and the Spark project.
2201:
1850:
467:
or reduce on an RDD by passing a function to Spark, which then schedules the function's execution in parallel on the cluster. These operations, and additional ones such as
842:
server. Although DataFrames lack the compile-time type-checking afforded by RDDs, as of Spark 2.0, the strongly typed DataSet is fully supported by Spark SQL as well.
2351:
1351:
Spark had in excess of 1000 contributors in 2015, making it one of the most active projects in the Apache
Software Foundation and one of the most active open source
2121:
3943:
1116:
In Spark 2.x, a separate technology based on
Datasets, called Structured Streaming, that has a higher-level interface is also provided to support streaming.
3027:
1158:. Many common machine learning and statistical algorithms have been implemented and are shipped with MLlib which simplifies large scale machine learning
4657:
2226:
Wang, Yandong; Goldstone, Robin; Yu, Weikuan; Wang, Teng (May 2014). "Characterization and
Optimization of Memory-Resident MapReduce on HPC Systems".
2062:
Zaharia, Matei; Chowdhury, Mosharaf; Das, Tathagata; Dave, Ankur; Ma, Justin; McCauley, Murphy; J., Michael; Shenker, Scott; Stoica, Ion (2010).
4033:
1987:
451:(the Java API is available for other JVM languages, but is also usable for some other non-JVM languages that can connect to the JVM, such as
287:
way. The
Dataframe API was released as an abstraction on top of the RDD, followed by the Dataset API. In Spark 1.x, the RDD was the primary
4642:
3885:
1820:
Spark 3.5.2 is based on Scala 2.13 (and thus works with Scala 2.12 and 2.13 out-of-the-box), but it can also be made to work with Scala 3.
1914:
517:
that performs a simple operation on a single data item (or a pair of items), and applies its argument to transform an RDD into a new RDD.
4682:
2482:
2749:
2427:
4677:
4662:
3680:
2806:
806:
is a component on top of Spark Core that introduced a data abstraction called DataFrames, which provides support for structured and
3020:
2616:
2558:
1314:
Apache Spark has built-in support for Scala, Java, SQL, R, and Python with 3rd party support for the .NET CLR, Julia, and more.
4014:
2846:
2402:
2243:
1828:
Apache Spark is developed by a community. The project is managed by a group called the "Project
Management Committee" (PMC).
4054:
2359:
4281:
4304:
3851:
3013:
2700:
2326:
4193:
2772:
2724:
2131:
4049:
4299:
4276:
2595:
423:
Spark Core is the foundation of the overall project. It provides distributed task dispatching, scheduling, and basic
340:, which visit their data set multiple times in a loop, and interactive/exploratory data analysis, i.e., the repeated
288:
244:
3878:
2458:
1259:
482:
Besides the RDD-oriented functional style of programming, Spark provides two restricted forms of shared variables:
188:
2643:
2153:
4271:
4086:
2083:
2507:
1338:
4378:
4292:
4241:
3036:
448:
181:
177:
4602:
4436:
4287:
3974:
3623:
823:
432:
169:
2063:
384:
280:
2642:
Gonzalez, Joseph; Xin, Reynold; Dave, Ankur; Crankshaw, Daniel; Franklin, Michael; Stoica, Ion (Oct 2014).
1961:
1249:
1245:
815:
452:
436:
315:
157:
127:
110:
87:
4621:
4567:
4027:
3871:
1269:
1235:
819:
428:
161:
80:
4652:
4647:
4546:
4341:
4226:
4188:
4038:
3928:
2082:
Xin, Reynold; Rosen, Josh; Zaharia, Matei; Franklin, Michael; Shenker, Scott; Stoica, Ion (June 2013).
1286:
1265:
1184:
1170:
839:
835:
268:
1299:
279:
Apache Spark has its architectural foundation in the resilient distributed dataset (RDD), a read-only
4562:
4541:
4486:
4373:
4363:
4336:
4198:
464:
4516:
4142:
4081:
3994:
3628:
1241:
811:
460:
444:
284:
173:
2773:"The Apache Software Foundation Announces Apache™ Spark™ as a Top-Level Project"
2725:"The Apache Software Foundation Announces Apache™ Spark™ as a Top-Level Project"
1106:
352:
MapReduce implementation. Among the class of iterative algorithms are the training algorithms for
4577:
4572:
4431:
4022:
3102:
1294:. GraphX provides two separate APIs for implementation of massively parallel algorithms (such as
1221:
1192:
1082:
1074:
1894:
MLlib in R: SparkR now offers MLlib APIs Python: PySpark now offers many more MLlib algorithms"
4672:
4316:
4248:
4152:
4044:
3999:
3643:
1333:
In 2013, the project was donated to the Apache
Software Foundation and switched its license to
1204:
831:
491:
456:
330:
311:
291:(API), but as of Spark 2.x use of the Dataset API is encouraged even though the RDD API is not
4408:
4368:
4321:
4311:
4106:
3969:
3908:
3567:
2435:
2312:
1988:"A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets: When to use them and why"
1889:
1139:
364:
345:
2188:
2065:
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
396:
4348:
4236:
4231:
4221:
4208:
4004:
2617:"Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database"
2559:"Finding Graph Isomorphisms In GraphX And GraphFrames: Graph Processing vs. Graph Database"
2102:
1120:
807:
408:
303:
240:
2914:
2896:
2878:
2860:
2820:
2798:
8:
4511:
4466:
4266:
4132:
3507:
1273:
1196:
1188:
1174:
1159:
1078:
337:
2999:
2387:
2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
2372:
re-use the same aggregates we wrote for our batch application on a real-time data stream
2178:
Figure showing Spark in relation to other open-source Software projects including Hadoop
2106:
1913:
Zaharia, Matei; Chowdhury, Mosharaf; Franklin, Michael J.; Shenker, Scott; Stoica, Ion.
1063://val countsByAge = spark.sql("SELECT age, count(*) FROM people GROUP BY age")
388:
318:
the results of the map, and store reduction results on disk. Spark's RDDs function as a
227:
4536:
4385:
4358:
4183:
4147:
4137:
4096:
3938:
3918:
3913:
3894:
3097:
2932:
2408:
2249:
2092:
2025:
virtually all Spark code you run, where DataFrames or Datasets, compiles down to an RDD
1949:
we highly recommend you to switch to use Dataset, which has better performance than RDD
1933:
1255:
1231:
1178:
1166:
514:
207:
26:
2037:
59:
4582:
4258:
4216:
4111:
3522:
3412:
3297:
3162:
3147:
3127:
2591:
2398:
2239:
2020:
1216:
1200:
1151:
1147:
886:"jdbc:mysql://yourIP:yourPort/test?user=yourUsername;password=yourPassword"
322:
for distributed programs that offers a (deliberately) restricted form of distributed
139:
2838:
2253:
4592:
4391:
4326:
4173:
3989:
3984:
3979:
3948:
3731:
3605:
3562:
3552:
3252:
3212:
3197:
3152:
2957:
2412:
2390:
2231:
1227:
472:
392:
353:
248:
202:
194:
134:
2971:
463:
model of programming: a "driver" program invokes parallel operations such as map,
116:
4667:
4456:
4396:
4331:
4178:
4168:
4101:
4091:
3933:
3923:
3766:
3761:
3741:
3597:
3577:
3537:
3532:
3527:
3512:
3467:
3242:
3132:
3062:
3057:
3052:
3005:
2585:
2327:"Applying the Lambda Architecture with Spark, Kafka, and Cassandra | Pluralsight"
1646:
1127:
476:
468:
360:
310:
structure on distributed programs: MapReduce programs read input data from disk,
252:
1098:
609:// Read files from "somedir" into an RDD of (filename, content) pairs.
4587:
4403:
4060:
3953:
3832:
3806:
3801:
3756:
3716:
3659:
3633:
3615:
3432:
3427:
3407:
3402:
3397:
3357:
3282:
3177:
3172:
3157:
3137:
3067:
2662:
1334:
1291:
1150:(ALS) implementations, and before Mahout itself gained a Spark interface), and
1110:
348:
of such applications may be reduced by several orders of magnitude compared to
333:(DAG). Nodes represent RDDs while edges represent the operations on the RDDs.
283:
of data items distributed over a cluster of machines, that is maintained in a
4636:
4476:
4353:
3791:
3746:
3721:
3592:
3582:
3557:
3542:
3517:
3462:
3422:
3362:
3337:
3332:
3312:
3292:
3287:
3262:
3247:
3182:
3167:
3077:
3072:
1697:
1680:
1663:
1629:
1612:
1595:
1578:
1561:
1544:
1527:
1510:
1493:
1476:
1459:
1442:
1425:
1408:
1391:
1374:
1323:
1212:
1208:
1155:
1143:
424:
368:
349:
323:
39:
2394:
298:
Spark and its RDDs were developed in 2012 in response to limitations in the
243:
unified analytics engine for large-scale data processing. Spark provides an
4076:
3786:
3771:
3726:
3675:
3638:
3587:
3502:
3487:
3482:
3477:
3472:
3452:
3447:
3392:
3387:
3377:
3342:
3327:
3317:
3302:
3272:
3267:
3232:
3227:
3217:
3207:
3202:
3192:
3142:
3117:
3092:
3087:
2177:
2154:"Cluster Mode Overview - Spark 2.4.0 Documentation - Cluster Manager Types"
1327:
1094:
1090:
1086:
427:
functionalities, exposed through an application programming interface (for
372:
2459:"Structured Streaming In Apache Spark: A new high-level API for streaming"
2235:
2228:
2014 IEEE 28th International Parallel and Distributed Processing Symposium
4597:
3811:
3751:
3706:
3547:
3492:
3457:
3367:
3347:
3322:
3307:
3277:
3257:
3222:
3122:
3112:
3107:
2760:
1123:
404:
319:
256:
1073:
Spark Streaming uses Spark Core's fast scheduling capability to perform
486:
reference read-only data that needs to be available on all nodes, while
3816:
3776:
3736:
3685:
3442:
3437:
3417:
3237:
3187:
3082:
1345:
705:// Add a count of one to each token, then sum the counts per word type.
376:
356:
systems, which formed the initial impetus for developing Apache Spark.
292:
46:
34:
4471:
4446:
3863:
400:
299:
2434:. Sigmoid (Sunnyvale, California IT product company). Archived from
4521:
4501:
4426:
3372:
3352:
2676:
2292:
1352:
1295:
412:
341:
307:
264:
2533:
2270:
2097:
1752:
1733:
1714:
1326:
at UC Berkeley's AMPLab in 2009, and open sourced in 2010 under a
1146:(according to benchmarks done by the MLlib developers against the
367:. For cluster management, Spark supports standalone native Spark,
4526:
4506:
4481:
4116:
380:
4496:
4491:
2681:
2508:"Spark Meetup: MLbase, Distributed Machine Learning with Spark"
2297:
2126:
1102:
260:
1922:. USENIX Workshop on Hot Topics in Cloud Computing (HotCloud).
792:// Get the top 10 words. Swap word and count to sort by count.
3796:
3711:
3690:
3382:
147:
143:
2993:
2645:
GraphX: Graph Processing in a Distributed Dataflow Framework
2383:
218:
4531:
4461:
4451:
3781:
3572:
2071:. USENIX Symp. Networked Systems Design and Implementation.
1912:
1851:
List of concurrent and parallel programming APIs/Frameworks
1348:
set a new world record in large scale sorting using Spark.
827:
440:
1799:
4441:
4418:
2750:
Spark officially sets a new record in large-scale sorting
2038:"What is Apache Spark? Spark Tutorial Guide for Beginner"
1787:
1782:
803:
165:
2641:
2293:"GitHub - DFDX/Spark.jl: Julia binding for Apache Spark"
2122:"4 reasons why Spark could jolt Hadoop into hyperdrive"
2081:
2061:
1089:. Spark Streaming has support built-in to consume from
2483:"On-Premises vs. Cloud Data Warehouses: Pros and Cons"
295:. The RDD technology still underlies the Dataset API.
16:
Open-source data analytics cluster computing framework
1344:
In November 2014, Spark founder M. Zaharia's company
1224:
techniques including alternating least squares (ALS)
471:, take RDDs as input and produce new RDDs. RDDs are
2352:"Building Lambda Architecture with Spark Streaming"
2225:
3035:
2701:"Apache Spark speeds up big data decision-making"
2663:".NET for Apache Spark | Big data analytics"
1133:
651:// Split each file into a list of tokens (words).
329:Inside Apache Spark the workflow is managed as a
4634:
1060://df.createOrReplaceTempView("people")
2505:
1831:
2779:. Apache Software Foundation. 27 February 2014
2731:. Apache Software Foundation. 27 February 2014
2514:. Spark User Meetup, San Francisco, California
830:. It also provides SQL language support, with
3879:
3021:
2506:Sparks, Evan; Talwalkar, Ameet (2013-08-06).
1792:
336:Spark facilitates the implementation of both
3886:
3872:
3028:
3014:
2998:
1916:Spark: Cluster Computing with Working Sets
226:
25:
4658:Data mining and machine learning software
2425:
2313:"Spark Release 1.3.0 | Apache Spark"
2096:
1018:// Looks at the schema of this DataFrame.
2010:
1804:
490:can be used to program reductions in an
2456:
2349:
1119:Spark can be deployed in a traditional
247:for programming clusters with implicit
4635:
3893:
2119:
2085:Shark: SQL and Rich Analytics at Scale
1908:
1906:
1904:
1902:
3867:
3009:
2694:
2692:
2614:
2583:
2556:
2265:
2263:
1985:
385:Hadoop Distributed File System (HDFS)
2199:
4643:Apache Software Foundation projects
2761:Open HUB Spark development activity
2428:"Getting Data into Spark Streaming"
1899:
1337:. In February 2014, Spark became a
1309:
1242:dimensionality reduction techniques
306:, which forces a particular linear
13:
4683:University of California, Berkeley
2689:
2426:Kharbanda, Arush (17 March 2015).
2260:
1869:Called SchemaRDDs before Spark 1.3
1699:Old version, no longer maintained:
1682:Old version, no longer maintained:
1665:Old version, no longer maintained:
1648:Old version, no longer maintained:
1631:Old version, no longer maintained:
1614:Old version, no longer maintained:
1597:Old version, no longer maintained:
1580:Old version, no longer maintained:
1563:Old version, no longer maintained:
1546:Old version, no longer maintained:
1529:Old version, no longer maintained:
1512:Old version, no longer maintained:
1495:Old version, no longer maintained:
1478:Old version, no longer maintained:
1461:Old version, no longer maintained:
1444:Old version, no longer maintained:
1427:Old version, no longer maintained:
1410:Old version, no longer maintained:
1393:Old version, no longer maintained:
1376:Old version, no longer maintained:
1068:
814:(DSL) to manipulate DataFrames in
257:University of California, Berkeley
14:
4694:
4678:Software using the Apache license
4663:Free software programmed in Scala
2985:
2698:
2202:"Re: cassandra + spark / pyspark"
289:application programming interface
271:, which has maintained it since.
4616:
4615:
3846:
3845:
2849:from the original on 2022-06-18.
2809:from the original on 2021-08-25.
2350:Shapira, Gwen (29 August 2014).
2120:Harris, Derrick (28 June 2014).
1815:
919:// Create a Spark session object
889:// URL for your database server.
4087:Analysis of parallel algorithms
2964:
2950:
2925:
2907:
2889:
2871:
2853:
2831:
2813:
2791:
2765:
2754:
2743:
2717:
2669:
2655:
2635:
2615:Malak, Michael (14 June 2016).
2608:
2577:
2557:Malak, Michael (14 June 2016).
2550:
2526:
2499:
2475:
2450:
2419:
2377:
2343:
2319:
2305:
2285:
2219:
2193:
2182:
2171:
2160:. Apache Foundation. 2019-07-09
2146:
2113:
1863:
1322:Spark was initially started by
1085:and the streaming component of
552:// create a spark config object
90:2.13) / August 10, 2024
3037:The Apache Software Foundation
2958:"Apache Committee Information"
2584:Malak, Michael (1 July 2016).
2075:
2055:
2030:
2004:
1979:
1962:"Spark 2.2.0 deprecation list"
1954:
1926:
1882:
1134:MLlib Machine Learning Library
255:. Originally developed at the
1:
4034:Simultaneous and heterogenous
2457:Zaharia, Matei (2016-07-28).
2189:MapR ecosystem support matrix
2011:Chambers, Bill (2017-08-10).
1876:
1841:expected even for bug fixes.
1823:
1788:Old version, still maintained
418:
344:-style querying of data. The
4622:Category: Parallel computing
2389:. IEEE. pp. 1789–1792.
1832:Maintenance releases and EOL
1250:principal component analysis
1246:singular value decomposition
797:
603:"/path/to/somedir"
455:). This interface mirrors a
314:a function across the data,
7:
2275:, .NET Platform, 2020-09-14
2200:Doan, DuyHai (2014-09-10).
2017:Spark: The Definitive Guide
1986:Damji, Jules (2016-07-14).
1844:
1837:point, even for bug fixes.
1783:Old version, not maintained
1270:stochastic gradient descent
1236:latent Dirichlet allocation
1057://or alternatively via SQL:
274:
10:
4699:
3929:High-performance computing
2933:"Using Scala 3 with Spark"
2358:. Cloudera. Archived from
2230:. IEEE. pp. 799–808.
1317:
1205:naive Bayes classification
389:MapR File System (MapR-FS)
365:distributed storage system
269:Apache Software Foundation
4611:
4563:Automatic parallelization
4555:
4417:
4257:
4207:
4199:Application checkpointing
4161:
4125:
4069:
4013:
3962:
3901:
3841:
3825:
3699:
3668:
3652:
3614:
3043:
1934:"Spark 2.2.0 Quick Start"
1771:
1280:
1148:alternating least squares
579:// Create a spark context
475:and their operations are
267:was later donated to the
213:
201:
187:
153:
133:
123:
109:
105:
79:
75:
55:
45:
33:
24:
1856:
1339:Top-Level Apache Project
1285:GraphX is a distributed
1228:cluster analysis methods
1181:, random data generation
844:
812:domain-specific language
519:
359:Apache Spark requires a
4578:Embarrassingly parallel
4573:Deterministic algorithm
2590:. Manning. p. 89.
2395:10.1109/IPDPSW.2016.138
1754:Current stable version:
1735:Current stable version:
1716:Current stable version:
1222:collaborative filtering
1193:support vector machines
1054:// Counts people by age
832:command-line interfaces
810:. Spark SQL provides a
4293:Associative processing
4249:Non-blocking algorithm
4055:Clustered multi-thread
2915:"Spark 3.5.2 released"
2897:"Spark 3.4.3 released"
2879:"Spark 3.3.3 released"
2861:"Spark 3.2.4 released"
2839:"Spark 3.1.3 released"
2821:"Spark 3.0.3 released"
2799:"Spark 2.4.8 released"
2587:Spark GraphX in Action
2534:"MLlib | Apache Spark"
1800:Latest preview version
1364:Original release date
447:) centered on the RDD
331:directed acyclic graph
62:; 10 years ago
4409:Hardware acceleration
4322:Superscalar processor
4312:Dataflow architecture
3909:Distributed computing
2236:10.1109/IPDPS.2014.87
1890:"Spark Release 2.0.0"
1217:Gradient-Boosted Tree
546:"wiki_test"
92:; 41 days ago
4288:Pipelined processing
4237:Explicit parallelism
4232:Implicit parallelism
4222:Dataflow programming
2487:SearchDataManagement
808:semi-structured data
338:iterative algorithms
4512:Parallel Extensions
4317:Pipelined processor
2972:"Versioning policy"
2603:aggregateMessages()
2331:www.pluralsight.com
2107:2012arXiv1211.6176X
2042:janbasktraining.com
1274:limited-memory BFGS
1268:algorithms such as
1197:logistic regression
1175:stratified sampling
1079:lambda architecture
1075:streaming analytics
985:"dbtable"
484:broadcast variables
21:
4386:Massively parallel
4364:distributed shared
4184:Cache invalidation
4148:Instruction window
3939:Manycore processor
3919:Massively parallel
3914:Parallel computing
3895:Parallel computing
3098:Apache HTTP Server
2705:ComputerWeekly.com
2685:. 14 October 2021.
2665:. 15 October 2019.
2134:on 24 October 2017
1256:feature extraction
1179:hypothesis testing
1167:summary statistics
1126:as well as in the
991:"people"
515:anonymous function
409:Lustre file system
302:cluster computing
208:Apache License 2.0
35:Original author(s)
19:
4653:Cluster computing
4648:Big data products
4630:
4629:
4583:Parallel slowdown
4217:Stream processing
4107:Karp–Flatt metric
3861:
3860:
2623:. sparksummit.org
2565:. sparksummit.org
2438:on 15 August 2016
2404:978-1-5090-3682-0
2245:978-1-4799-3800-1
1813:
1812:
1809:
1201:linear regression
1138:Spark MLlib is a
234:
233:
140:Microsoft Windows
60:May 26, 2014
4690:
4619:
4618:
4593:Software lockout
4392:Computer cluster
4327:Vector processor
4282:Array processing
4267:Flynn's taxonomy
4174:Memory coherence
3949:Computer network
3888:
3881:
3874:
3865:
3864:
3849:
3848:
3030:
3023:
3016:
3007:
3006:
3002:
2997:
2996:
2994:Official website
2980:
2979:
2976:spark.apache.org
2968:
2962:
2961:
2954:
2948:
2947:
2945:
2943:
2929:
2923:
2922:
2919:spark.apache.org
2911:
2905:
2904:
2901:spark.apache.org
2893:
2887:
2886:
2883:spark.apache.org
2875:
2869:
2868:
2865:spark.apache.org
2857:
2851:
2850:
2843:spark.apache.org
2835:
2829:
2828:
2825:spark.apache.org
2817:
2811:
2810:
2803:spark.apache.org
2795:
2789:
2788:
2786:
2784:
2769:
2763:
2758:
2752:
2747:
2741:
2740:
2738:
2736:
2721:
2715:
2714:
2712:
2711:
2699:Clark, Lindsay.
2696:
2687:
2686:
2673:
2667:
2666:
2659:
2653:
2652:
2650:
2639:
2633:
2632:
2630:
2628:
2612:
2606:
2605:
2581:
2575:
2574:
2572:
2570:
2554:
2548:
2547:
2545:
2544:
2538:spark.apache.org
2530:
2524:
2523:
2521:
2519:
2503:
2497:
2496:
2494:
2493:
2479:
2473:
2472:
2470:
2469:
2454:
2448:
2447:
2445:
2443:
2423:
2417:
2416:
2381:
2375:
2374:
2369:
2367:
2347:
2341:
2340:
2338:
2337:
2323:
2317:
2316:
2309:
2303:
2302:
2289:
2283:
2282:
2281:
2280:
2267:
2258:
2257:
2223:
2217:
2216:
2214:
2213:
2197:
2191:
2186:
2180:
2175:
2169:
2168:
2166:
2165:
2150:
2144:
2143:
2141:
2139:
2130:. Archived from
2117:
2111:
2110:
2100:
2090:
2079:
2073:
2072:
2070:
2059:
2053:
2052:
2050:
2049:
2034:
2028:
2027:
2008:
2002:
2001:
1999:
1998:
1983:
1977:
1976:
1974:
1973:
1958:
1952:
1951:
1946:
1945:
1930:
1924:
1923:
1921:
1910:
1897:
1896:
1886:
1870:
1867:
1806:
1801:
1796:
1789:
1784:
1779:
1772:
1755:
1736:
1717:
1700:
1683:
1666:
1649:
1632:
1615:
1598:
1581:
1564:
1547:
1530:
1513:
1496:
1479:
1462:
1445:
1428:
1411:
1394:
1377:
1358:
1357:
1310:Language support
1287:graph-processing
1064:
1061:
1058:
1055:
1052:
1049:
1046:
1043:
1040:
1037:
1034:
1031:
1028:
1025:
1022:
1019:
1016:
1013:
1010:
1007:
1004:
1001:
998:
995:
992:
989:
986:
983:
980:
977:
974:
971:
968:
965:
962:
959:
956:
953:
950:
949:"jdbc"
947:
944:
941:
938:
935:
932:
929:
926:
923:
920:
917:
914:
911:
908:
905:
902:
899:
896:
893:
890:
887:
884:
881:
878:
875:
872:
869:
866:
863:
860:
857:
854:
851:
848:
793:
790:
787:
784:
781:
778:
775:
772:
769:
766:
763:
760:
757:
754:
751:
748:
745:
742:
739:
736:
733:
730:
727:
724:
721:
718:
715:
712:
709:
706:
703:
700:
697:
694:
691:
688:
685:
682:
679:
676:
673:
670:
667:
664:
661:
658:
655:
652:
649:
646:
643:
640:
637:
634:
631:
628:
625:
622:
619:
616:
613:
610:
607:
604:
601:
598:
595:
592:
589:
586:
583:
580:
577:
574:
571:
568:
565:
562:
559:
556:
553:
550:
547:
544:
541:
538:
535:
532:
529:
526:
523:
512:
508:
504:
500:
354:machine learning
249:data parallelism
230:
225:
222:
220:
195:machine learning
193:Data analytics,
135:Operating system
119:
117:Spark Repository
100:
98:
93:
70:
68:
63:
29:
22:
18:
4698:
4697:
4693:
4692:
4691:
4689:
4688:
4687:
4633:
4632:
4631:
4626:
4607:
4551:
4457:Coarray Fortran
4413:
4397:Beowulf cluster
4253:
4203:
4194:Synchronization
4179:Cache coherence
4169:Multiprocessing
4157:
4121:
4102:Cost efficiency
4097:Gustafson's law
4065:
4009:
3958:
3934:Multiprocessing
3924:Cloud computing
3897:
3892:
3862:
3857:
3837:
3821:
3695:
3664:
3648:
3610:
3045:
3039:
3034:
2992:
2991:
2988:
2983:
2970:
2969:
2965:
2956:
2955:
2951:
2941:
2939:
2931:
2930:
2926:
2913:
2912:
2908:
2895:
2894:
2890:
2877:
2876:
2872:
2859:
2858:
2854:
2837:
2836:
2832:
2819:
2818:
2814:
2797:
2796:
2792:
2782:
2780:
2771:
2770:
2766:
2759:
2755:
2748:
2744:
2734:
2732:
2723:
2722:
2718:
2709:
2707:
2697:
2690:
2675:
2674:
2670:
2661:
2660:
2656:
2648:
2640:
2636:
2626:
2624:
2613:
2609:
2598:
2582:
2578:
2568:
2566:
2555:
2551:
2542:
2540:
2532:
2531:
2527:
2517:
2515:
2504:
2500:
2491:
2489:
2481:
2480:
2476:
2467:
2465:
2455:
2451:
2441:
2439:
2424:
2420:
2405:
2382:
2378:
2365:
2363:
2362:on 14 June 2016
2348:
2344:
2335:
2333:
2325:
2324:
2320:
2311:
2310:
2306:
2291:
2290:
2286:
2278:
2276:
2269:
2268:
2261:
2246:
2224:
2220:
2211:
2209:
2198:
2194:
2187:
2183:
2176:
2172:
2163:
2161:
2152:
2151:
2147:
2137:
2135:
2118:
2114:
2091:. SIGMOD 2013.
2088:
2080:
2076:
2068:
2060:
2056:
2047:
2045:
2036:
2035:
2031:
2009:
2005:
1996:
1994:
1984:
1980:
1971:
1969:
1960:
1959:
1955:
1943:
1941:
1932:
1931:
1927:
1919:
1911:
1900:
1888:
1887:
1883:
1879:
1874:
1873:
1868:
1864:
1859:
1847:
1834:
1826:
1818:
1808:
1807:
1802:
1797:
1790:
1785:
1780:
1775:
1753:
1734:
1715:
1698:
1681:
1664:
1647:
1630:
1613:
1596:
1579:
1562:
1545:
1528:
1511:
1494:
1477:
1460:
1443:
1426:
1409:
1392:
1375:
1367:Latest version
1320:
1312:
1283:
1136:
1071:
1069:Spark Streaming
1066:
1065:
1062:
1059:
1056:
1053:
1050:
1047:
1044:
1042:"age"
1041:
1038:
1035:
1032:
1029:
1026:
1023:
1020:
1017:
1014:
1011:
1008:
1005:
1002:
999:
996:
993:
990:
987:
984:
981:
978:
975:
972:
969:
966:
964:"url"
963:
960:
957:
954:
951:
948:
945:
942:
939:
936:
933:
930:
927:
924:
921:
918:
915:
912:
909:
906:
903:
900:
897:
894:
891:
888:
885:
882:
879:
876:
873:
870:
867:
864:
861:
858:
855:
852:
849:
846:
800:
795:
794:
791:
788:
785:
782:
779:
776:
773:
770:
767:
764:
761:
758:
755:
752:
749:
746:
743:
740:
737:
734:
731:
728:
725:
722:
719:
716:
713:
710:
707:
704:
701:
698:
695:
692:
689:
686:
683:
680:
677:
674:
671:
668:
665:
662:
659:
656:
653:
650:
647:
644:
641:
638:
635:
632:
629:
626:
623:
620:
617:
614:
611:
608:
605:
602:
599:
596:
593:
590:
587:
584:
581:
578:
575:
572:
569:
566:
563:
560:
557:
554:
551:
548:
545:
542:
539:
536:
533:
530:
527:
524:
521:
510:
506:
502:
498:
421:
397:OpenStack Swift
361:cluster manager
277:
253:fault tolerance
217:
115:
101:
96:
94:
91:
66:
64:
61:
56:Initial release
17:
12:
11:
5:
4696:
4686:
4685:
4680:
4675:
4670:
4665:
4660:
4655:
4650:
4645:
4628:
4627:
4625:
4624:
4612:
4609:
4608:
4606:
4605:
4600:
4595:
4590:
4588:Race condition
4585:
4580:
4575:
4570:
4565:
4559:
4557:
4553:
4552:
4550:
4549:
4544:
4539:
4534:
4529:
4524:
4519:
4514:
4509:
4504:
4499:
4494:
4489:
4484:
4479:
4474:
4469:
4464:
4459:
4454:
4449:
4444:
4439:
4434:
4429:
4423:
4421:
4415:
4414:
4412:
4411:
4406:
4401:
4400:
4399:
4389:
4383:
4382:
4381:
4376:
4371:
4366:
4361:
4356:
4346:
4345:
4344:
4339:
4332:Multiprocessor
4329:
4324:
4319:
4314:
4309:
4308:
4307:
4302:
4297:
4296:
4295:
4290:
4285:
4274:
4263:
4261:
4255:
4254:
4252:
4251:
4246:
4245:
4244:
4239:
4234:
4224:
4219:
4213:
4211:
4205:
4204:
4202:
4201:
4196:
4191:
4186:
4181:
4176:
4171:
4165:
4163:
4159:
4158:
4156:
4155:
4150:
4145:
4140:
4135:
4129:
4127:
4123:
4122:
4120:
4119:
4114:
4109:
4104:
4099:
4094:
4089:
4084:
4079:
4073:
4071:
4067:
4066:
4064:
4063:
4061:Hardware scout
4058:
4052:
4047:
4042:
4036:
4031:
4025:
4019:
4017:
4015:Multithreading
4011:
4010:
4008:
4007:
4002:
3997:
3992:
3987:
3982:
3977:
3972:
3966:
3964:
3960:
3959:
3957:
3956:
3954:Systolic array
3951:
3946:
3941:
3936:
3931:
3926:
3921:
3916:
3911:
3905:
3903:
3899:
3898:
3891:
3890:
3883:
3876:
3868:
3859:
3858:
3856:
3855:
3842:
3839:
3838:
3836:
3835:
3833:Apache License
3829:
3827:
3823:
3822:
3820:
3819:
3814:
3809:
3804:
3799:
3794:
3789:
3784:
3779:
3774:
3769:
3764:
3759:
3754:
3749:
3744:
3739:
3734:
3729:
3724:
3719:
3714:
3709:
3703:
3701:
3697:
3696:
3694:
3693:
3688:
3683:
3678:
3672:
3670:
3669:Other projects
3666:
3665:
3663:
3662:
3656:
3654:
3650:
3649:
3647:
3646:
3641:
3636:
3631:
3626:
3620:
3618:
3612:
3611:
3609:
3608:
3603:
3600:
3595:
3590:
3585:
3580:
3575:
3570:
3568:Traffic Server
3565:
3560:
3555:
3550:
3545:
3540:
3535:
3530:
3525:
3520:
3515:
3510:
3505:
3500:
3495:
3490:
3485:
3480:
3475:
3470:
3465:
3460:
3455:
3450:
3445:
3440:
3435:
3430:
3425:
3420:
3415:
3410:
3405:
3400:
3395:
3390:
3385:
3380:
3375:
3370:
3365:
3360:
3355:
3350:
3345:
3340:
3335:
3330:
3325:
3320:
3315:
3310:
3305:
3300:
3295:
3290:
3285:
3280:
3275:
3270:
3265:
3260:
3255:
3250:
3245:
3240:
3235:
3230:
3225:
3220:
3215:
3210:
3205:
3200:
3195:
3190:
3185:
3180:
3175:
3170:
3165:
3160:
3155:
3150:
3145:
3140:
3135:
3130:
3125:
3120:
3115:
3110:
3105:
3100:
3095:
3090:
3085:
3080:
3075:
3070:
3065:
3060:
3055:
3049:
3047:
3041:
3040:
3033:
3032:
3025:
3018:
3010:
3004:
3003:
2987:
2986:External links
2984:
2982:
2981:
2963:
2949:
2924:
2906:
2888:
2870:
2852:
2830:
2812:
2790:
2764:
2753:
2742:
2716:
2688:
2668:
2654:
2634:
2621:slideshare.net
2607:
2596:
2576:
2563:slideshare.net
2549:
2525:
2512:slideshare.net
2498:
2474:
2463:databricks.com
2449:
2418:
2403:
2376:
2342:
2318:
2304:
2284:
2259:
2244:
2218:
2208:(Mailing list)
2206:Cassandra User
2192:
2181:
2170:
2145:
2112:
2074:
2054:
2029:
2021:O'Reilly Media
2003:
1992:databricks.com
1978:
1953:
1925:
1898:
1880:
1878:
1875:
1872:
1871:
1861:
1860:
1858:
1855:
1854:
1853:
1846:
1843:
1833:
1830:
1825:
1822:
1817:
1814:
1811:
1810:
1805:Future release
1803:
1798:
1794:Latest version
1791:
1786:
1781:
1774:
1773:
1769:
1768:
1765:
1762:
1759:
1750:
1749:
1746:
1743:
1740:
1731:
1730:
1727:
1724:
1721:
1712:
1711:
1708:
1705:
1702:
1695:
1694:
1691:
1688:
1685:
1678:
1677:
1674:
1671:
1668:
1661:
1660:
1657:
1654:
1651:
1644:
1643:
1640:
1637:
1634:
1627:
1626:
1623:
1620:
1617:
1610:
1609:
1606:
1603:
1600:
1593:
1592:
1589:
1586:
1583:
1576:
1575:
1572:
1569:
1566:
1559:
1558:
1555:
1552:
1549:
1542:
1541:
1538:
1535:
1532:
1525:
1524:
1521:
1518:
1515:
1508:
1507:
1504:
1501:
1498:
1491:
1490:
1487:
1484:
1481:
1474:
1473:
1470:
1467:
1464:
1457:
1456:
1453:
1450:
1447:
1440:
1439:
1436:
1433:
1430:
1423:
1422:
1419:
1416:
1413:
1406:
1405:
1402:
1399:
1396:
1389:
1388:
1385:
1382:
1379:
1372:
1371:
1368:
1365:
1362:
1319:
1316:
1311:
1308:
1292:graph database
1282:
1279:
1278:
1277:
1263:
1260:transformation
1253:
1239:
1225:
1219:
1185:classification
1182:
1135:
1132:
1111:TCP/IP sockets
1070:
1067:
845:
799:
796:
520:
505:(a variant of
420:
417:
285:fault-tolerant
276:
273:
232:
231:
215:
211:
210:
205:
199:
198:
191:
185:
184:
155:
151:
150:
137:
131:
130:
125:
121:
120:
113:
107:
106:
103:
102:
85:
83:
81:Stable release
77:
76:
73:
72:
57:
53:
52:
49:
43:
42:
37:
31:
30:
15:
9:
6:
4:
3:
2:
4695:
4684:
4681:
4679:
4676:
4674:
4673:Java platform
4671:
4669:
4666:
4664:
4661:
4659:
4656:
4654:
4651:
4649:
4646:
4644:
4641:
4640:
4638:
4623:
4614:
4613:
4610:
4604:
4601:
4599:
4596:
4594:
4591:
4589:
4586:
4584:
4581:
4579:
4576:
4574:
4571:
4569:
4566:
4564:
4561:
4560:
4558:
4554:
4548:
4545:
4543:
4540:
4538:
4535:
4533:
4530:
4528:
4525:
4523:
4520:
4518:
4515:
4513:
4510:
4508:
4505:
4503:
4500:
4498:
4495:
4493:
4490:
4488:
4485:
4483:
4480:
4478:
4477:Global Arrays
4475:
4473:
4470:
4468:
4465:
4463:
4460:
4458:
4455:
4453:
4450:
4448:
4445:
4443:
4440:
4438:
4435:
4433:
4430:
4428:
4425:
4424:
4422:
4420:
4416:
4410:
4407:
4405:
4404:Grid computer
4402:
4398:
4395:
4394:
4393:
4390:
4387:
4384:
4380:
4377:
4375:
4372:
4370:
4367:
4365:
4362:
4360:
4357:
4355:
4352:
4351:
4350:
4347:
4343:
4340:
4338:
4335:
4334:
4333:
4330:
4328:
4325:
4323:
4320:
4318:
4315:
4313:
4310:
4306:
4303:
4301:
4298:
4294:
4291:
4289:
4286:
4283:
4280:
4279:
4278:
4275:
4273:
4270:
4269:
4268:
4265:
4264:
4262:
4260:
4256:
4250:
4247:
4243:
4240:
4238:
4235:
4233:
4230:
4229:
4228:
4225:
4223:
4220:
4218:
4215:
4214:
4212:
4210:
4206:
4200:
4197:
4195:
4192:
4190:
4187:
4185:
4182:
4180:
4177:
4175:
4172:
4170:
4167:
4166:
4164:
4160:
4154:
4151:
4149:
4146:
4144:
4141:
4139:
4136:
4134:
4131:
4130:
4128:
4124:
4118:
4115:
4113:
4110:
4108:
4105:
4103:
4100:
4098:
4095:
4093:
4090:
4088:
4085:
4083:
4080:
4078:
4075:
4074:
4072:
4068:
4062:
4059:
4056:
4053:
4051:
4048:
4046:
4043:
4040:
4037:
4035:
4032:
4029:
4026:
4024:
4021:
4020:
4018:
4016:
4012:
4006:
4003:
4001:
3998:
3996:
3993:
3991:
3988:
3986:
3983:
3981:
3978:
3976:
3973:
3971:
3968:
3967:
3965:
3961:
3955:
3952:
3950:
3947:
3945:
3942:
3940:
3937:
3935:
3932:
3930:
3927:
3925:
3922:
3920:
3917:
3915:
3912:
3910:
3907:
3906:
3904:
3900:
3896:
3889:
3884:
3882:
3877:
3875:
3870:
3869:
3866:
3854:
3853:
3844:
3843:
3840:
3834:
3831:
3830:
3828:
3824:
3818:
3815:
3813:
3810:
3808:
3805:
3803:
3800:
3798:
3795:
3793:
3790:
3788:
3785:
3783:
3780:
3778:
3775:
3773:
3770:
3768:
3765:
3763:
3760:
3758:
3755:
3753:
3750:
3748:
3745:
3743:
3740:
3738:
3735:
3733:
3730:
3728:
3725:
3723:
3720:
3718:
3715:
3713:
3710:
3708:
3705:
3704:
3702:
3698:
3692:
3689:
3687:
3684:
3682:
3679:
3677:
3674:
3673:
3671:
3667:
3661:
3658:
3657:
3655:
3651:
3645:
3642:
3640:
3637:
3635:
3632:
3630:
3627:
3625:
3622:
3621:
3619:
3617:
3613:
3607:
3604:
3601:
3599:
3596:
3594:
3591:
3589:
3586:
3584:
3581:
3579:
3576:
3574:
3571:
3569:
3566:
3564:
3561:
3559:
3556:
3554:
3551:
3549:
3546:
3544:
3541:
3539:
3536:
3534:
3531:
3529:
3526:
3524:
3521:
3519:
3516:
3514:
3511:
3509:
3506:
3504:
3501:
3499:
3496:
3494:
3491:
3489:
3486:
3484:
3481:
3479:
3476:
3474:
3471:
3469:
3466:
3464:
3461:
3459:
3456:
3454:
3451:
3449:
3446:
3444:
3441:
3439:
3436:
3434:
3431:
3429:
3426:
3424:
3421:
3419:
3416:
3414:
3411:
3409:
3406:
3404:
3401:
3399:
3396:
3394:
3391:
3389:
3386:
3384:
3381:
3379:
3376:
3374:
3371:
3369:
3366:
3364:
3361:
3359:
3356:
3354:
3351:
3349:
3346:
3344:
3341:
3339:
3336:
3334:
3331:
3329:
3326:
3324:
3321:
3319:
3316:
3314:
3311:
3309:
3306:
3304:
3301:
3299:
3296:
3294:
3291:
3289:
3286:
3284:
3281:
3279:
3276:
3274:
3271:
3269:
3266:
3264:
3261:
3259:
3256:
3254:
3251:
3249:
3246:
3244:
3241:
3239:
3236:
3234:
3231:
3229:
3226:
3224:
3221:
3219:
3216:
3214:
3211:
3209:
3206:
3204:
3201:
3199:
3196:
3194:
3191:
3189:
3186:
3184:
3181:
3179:
3176:
3174:
3171:
3169:
3166:
3164:
3161:
3159:
3156:
3154:
3151:
3149:
3146:
3144:
3141:
3139:
3136:
3134:
3131:
3129:
3126:
3124:
3121:
3119:
3116:
3114:
3111:
3109:
3106:
3104:
3101:
3099:
3096:
3094:
3091:
3089:
3086:
3084:
3081:
3079:
3076:
3074:
3071:
3069:
3066:
3064:
3061:
3059:
3056:
3054:
3051:
3050:
3048:
3042:
3038:
3031:
3026:
3024:
3019:
3017:
3012:
3011:
3008:
3001:
2995:
2990:
2989:
2977:
2973:
2967:
2959:
2953:
2938:
2934:
2928:
2920:
2916:
2910:
2902:
2898:
2892:
2884:
2880:
2874:
2866:
2862:
2856:
2848:
2844:
2840:
2834:
2826:
2822:
2816:
2808:
2804:
2800:
2794:
2778:
2774:
2768:
2762:
2757:
2751:
2746:
2730:
2726:
2720:
2706:
2702:
2695:
2693:
2684:
2683:
2678:
2672:
2664:
2658:
2647:
2646:
2638:
2622:
2618:
2611:
2604:
2599:
2597:9781617292521
2593:
2589:
2588:
2580:
2564:
2560:
2553:
2539:
2535:
2529:
2513:
2509:
2502:
2488:
2484:
2478:
2464:
2460:
2453:
2437:
2433:
2429:
2422:
2414:
2410:
2406:
2400:
2396:
2392:
2388:
2380:
2373:
2361:
2357:
2353:
2346:
2332:
2328:
2322:
2314:
2308:
2301:. 2019-05-24.
2300:
2299:
2294:
2288:
2274:
2273:
2266:
2264:
2255:
2251:
2247:
2241:
2237:
2233:
2229:
2222:
2207:
2203:
2196:
2190:
2185:
2179:
2174:
2159:
2155:
2149:
2133:
2129:
2128:
2123:
2116:
2108:
2104:
2099:
2094:
2087:
2086:
2078:
2067:
2066:
2058:
2043:
2039:
2033:
2026:
2022:
2018:
2014:
2007:
1993:
1989:
1982:
1967:
1963:
1957:
1950:
1939:
1935:
1929:
1918:
1917:
1909:
1907:
1905:
1903:
1895:
1891:
1885:
1881:
1866:
1862:
1852:
1849:
1848:
1842:
1838:
1829:
1821:
1816:Scala Version
1795:
1778:
1770:
1766:
1763:
1760:
1758:
1751:
1747:
1744:
1741:
1739:
1732:
1728:
1725:
1722:
1720:
1713:
1709:
1706:
1703:
1696:
1692:
1689:
1686:
1679:
1675:
1672:
1669:
1662:
1658:
1655:
1652:
1645:
1641:
1638:
1635:
1628:
1624:
1621:
1618:
1611:
1607:
1604:
1601:
1594:
1590:
1587:
1584:
1577:
1573:
1570:
1567:
1560:
1556:
1553:
1550:
1543:
1539:
1536:
1533:
1526:
1522:
1519:
1516:
1509:
1505:
1502:
1499:
1492:
1488:
1485:
1482:
1475:
1471:
1468:
1465:
1458:
1454:
1451:
1448:
1441:
1437:
1434:
1431:
1424:
1420:
1417:
1414:
1407:
1403:
1400:
1397:
1390:
1386:
1383:
1380:
1373:
1370:Release date
1369:
1366:
1363:
1360:
1359:
1356:
1354:
1349:
1347:
1342:
1340:
1336:
1331:
1329:
1325:
1324:Matei Zaharia
1315:
1307:
1304:
1301:
1297:
1293:
1288:
1275:
1271:
1267:
1264:
1261:
1257:
1254:
1251:
1247:
1243:
1240:
1237:
1233:
1229:
1226:
1223:
1220:
1218:
1214:
1213:Random Forest
1210:
1209:Decision Tree
1206:
1202:
1198:
1194:
1190:
1186:
1183:
1180:
1176:
1172:
1168:
1165:
1164:
1163:
1162:, including:
1161:
1157:
1156:Vowpal Wabbit
1153:
1149:
1145:
1144:Apache Mahout
1141:
1131:
1129:
1125:
1122:
1117:
1114:
1112:
1108:
1104:
1100:
1096:
1092:
1088:
1084:
1080:
1076:
843:
841:
837:
833:
829:
825:
821:
817:
813:
809:
805:
645:" "
518:
516:
495:
493:
489:
485:
480:
478:
474:
470:
466:
462:
458:
454:
450:
446:
442:
438:
434:
430:
426:
416:
414:
410:
406:
402:
398:
394:
390:
386:
382:
378:
374:
370:
366:
362:
357:
355:
351:
350:Apache Hadoop
347:
343:
339:
334:
332:
327:
325:
324:shared memory
321:
317:
313:
309:
305:
301:
296:
294:
290:
286:
282:
272:
270:
266:
262:
258:
254:
250:
246:
242:
238:
229:
224:
216:
212:
209:
206:
204:
200:
196:
192:
190:
186:
183:
179:
175:
171:
167:
163:
159:
156:
152:
149:
145:
141:
138:
136:
132:
129:
126:
122:
118:
114:
112:
108:
104:
89:
84:
82:
78:
74:
71:
58:
54:
50:
48:
44:
41:
40:Matei Zaharia
38:
36:
32:
28:
23:
4162:Coordination
4092:Amdahl's law
4028:Simultaneous
3850:
3508:SpamAssassin
3497:
2975:
2966:
2952:
2940:. Retrieved
2936:
2927:
2918:
2909:
2900:
2891:
2882:
2873:
2864:
2855:
2842:
2833:
2824:
2815:
2802:
2793:
2781:. Retrieved
2776:
2767:
2756:
2745:
2733:. Retrieved
2728:
2719:
2708:. Retrieved
2704:
2680:
2671:
2657:
2651:. OSDI 2014.
2644:
2637:
2625:. Retrieved
2620:
2610:
2601:
2586:
2579:
2567:. Retrieved
2562:
2552:
2541:. Retrieved
2537:
2528:
2516:. Retrieved
2511:
2501:
2490:. Retrieved
2486:
2477:
2466:. Retrieved
2462:
2452:
2440:. Retrieved
2436:the original
2431:
2421:
2386:
2379:
2371:
2364:. Retrieved
2360:the original
2356:cloudera.com
2355:
2345:
2334:. Retrieved
2330:
2321:
2307:
2296:
2287:
2277:, retrieved
2272:dotnet/spark
2271:
2227:
2221:
2210:. Retrieved
2205:
2195:
2184:
2173:
2162:. Retrieved
2157:
2148:
2136:. Retrieved
2132:the original
2125:
2115:
2084:
2077:
2064:
2057:
2046:. Retrieved
2044:. 2018-04-13
2041:
2032:
2024:
2016:
2006:
1995:. Retrieved
1991:
1981:
1970:. Retrieved
1968:. 2017-07-11
1965:
1956:
1948:
1942:. Retrieved
1940:. 2017-07-11
1937:
1928:
1915:
1893:
1884:
1865:
1839:
1835:
1827:
1819:
1793:
1776:
1756:
1737:
1718:
1350:
1343:
1332:
1321:
1313:
1305:
1284:
1266:optimization
1171:correlations
1154:better than
1137:
1118:
1115:
1072:
901:SparkSession
874:SparkSession
801:
567:SparkContext
496:
488:accumulators
487:
483:
481:
461:higher-order
422:
373:Apache Mesos
358:
335:
328:
297:
278:
263:, the Spark
237:Apache Spark
236:
235:
154:Available in
51:Apache Spark
47:Developer(s)
20:Apache Spark
4598:Scalability
4359:distributed
4242:Concurrency
4209:Programming
4050:Cooperative
4039:Speculative
3975:Instruction
2518:10 February
2432:sigmoid.com
2138:25 February
1767:2024-08-10
1761:2023-09-09
1748:2024-04-18
1742:2023-04-13
1729:2023-08-21
1723:2022-06-16
1710:2023-04-13
1704:2021-10-13
1693:2022-02-18
1687:2021-03-02
1676:2021-06-01
1670:2020-06-18
1659:2021-05-17
1653:2018-11-02
1642:2019-09-09
1636:2018-02-28
1625:2019-01-11
1619:2017-07-11
1608:2018-06-26
1602:2016-12-28
1591:2016-11-14
1585:2016-07-26
1574:2016-11-07
1568:2016-01-04
1557:2015-11-09
1551:2015-09-09
1540:2015-07-15
1534:2015-06-11
1523:2015-04-17
1517:2015-03-13
1506:2015-04-17
1500:2014-12-18
1489:2014-11-26
1483:2014-09-11
1472:2014-08-05
1466:2014-05-26
1455:2014-07-23
1449:2014-02-02
1438:2013-12-19
1432:2013-09-25
1421:2013-07-16
1415:2013-02-27
1404:2013-02-07
1398:2012-10-15
1387:2012-11-22
1381:2012-06-12
1328:BSD license
1248:(SVD), and
1140:distributed
1124:data center
1121:on-premises
1024:countsByAge
1012:printSchema
913:getOrCreate
687:reduceByKey
511:reduceByKey
449:abstraction
369:Hadoop YARN
320:working set
241:open-source
4637:Categories
4603:Starvation
4342:asymmetric
4077:PRAM model
4045:Preemptive
3737:Deltacloud
3523:Subversion
3413:OŃ€enOffice
3298:Jackrabbit
3238:FreeMarker
3163:CloudStack
3148:CarbonData
3128:Bloodhound
2937:47 Degrees
2777:apache.org
2729:apache.org
2710:2018-05-16
2677:"Spark.jl"
2543:2016-01-18
2492:2022-10-16
2468:2017-10-19
2336:2016-11-20
2279:2020-09-14
2212:2014-11-21
2164:2019-07-09
2158:apache.org
2048:2018-04-13
1997:2017-10-19
1972:2017-10-10
1966:apache.org
1944:2017-10-19
1938:apache.org
1877:References
1824:Developers
1355:projects.
1346:Databricks
1335:Apache 2.0
1230:including
1189:regression
540:setAppName
492:imperative
457:functional
419:Spark Core
377:Kubernetes
293:deprecated
197:algorithms
124:Written in
111:Repository
97:2024-08-10
67:2014-05-26
4337:symmetric
4082:PEM model
3732:Continuum
3653:Incubator
3606:ZooKeeper
3563:Trafodion
3553:TinkerPop
3253:Guacamole
3213:Empire-db
3198:Directory
3153:Cassandra
3044:Top-level
2098:1211.6176
1262:functions
1160:pipelines
798:Spark SQL
534:SparkConf
513:takes an
473:immutable
401:Amazon S3
393:Cassandra
300:MapReduce
245:interface
4568:Deadlock
4556:Problems
4522:pthreads
4502:OpenHMPP
4427:Ateji PX
4388:computer
4259:Hardware
4126:Elements
4112:Slowdown
4023:Temporal
4005:Pipeline
3852:Category
3826:Licenses
3767:Marmotta
3598:XMLBeans
3578:Velocity
3538:Tapestry
3533:SystemDS
3528:Superset
3518:Struts 2
3513:Struts 1
3468:RocketMQ
3373:NetBeans
3353:mod_perl
3243:Geronimo
3133:Brooklyn
3063:Airavata
3058:ActiveMQ
3053:Accumulo
3046:projects
2847:Archived
2807:Archived
2254:11157612
1845:See also
1650:2.4 LTS
1361:Version
1353:big data
1296:PageRank
1276:(L-BFGS)
1244:such as
708:wordFreq
657:wordFreq
597:textFile
413:CPU core
342:database
308:dataflow
304:paradigm
281:multiset
275:Overview
265:codebase
4527:RaftLib
4507:OpenACC
4482:GPUOpen
4472:C++ AMP
4447:Charm++
4189:Barrier
4133:Process
4117:Speedup
3902:General
3807:Tuscany
3802:Stanbol
3762:Jakarta
3757:Harmony
3717:Beehive
3660:Taverna
3644:Logging
3616:Commons
3433:Phoenix
3428:Parquet
3408:OpenNLP
3403:OpenJPA
3398:OpenEJB
3358:MyFaces
3283:Iceberg
3178:CouchDB
3173:Cordova
3158:Cayenne
3138:Calcite
3068:Airflow
2942:29 July
2783:4 March
2735:4 March
2627:11 July
2569:11 July
2413:2180634
2366:17 June
2103:Bibcode
1777:Legend:
1318:History
1232:k-means
1107:Kinesis
1099:Twitter
1036:groupBy
907:builder
627:flatMap
503:flatMap
494:style.
381:Alluxio
346:latency
221:.apache
214:Website
203:License
95: (
86:3.5.2 (
65: (
4668:Hadoop
4620:
4497:OpenCL
4492:OpenMP
4437:Chapel
4354:shared
4349:Memory
4284:(SIMT)
4227:Models
4138:Thread
4070:Theory
4041:(SpMT)
3995:Memory
3980:Thread
3963:Levels
3747:Giraph
3722:iBATIS
3634:Daemon
3593:Xerces
3583:Wicket
3558:Tomcat
3543:Thrift
3463:Roller
3423:PDFBox
3363:Mynewt
3338:Mahout
3333:Lucene
3313:JMeter
3293:Impala
3288:Ignite
3263:Hadoop
3248:Groovy
3183:cTAKES
3168:Cocoon
3078:Ambari
3073:Allura
2682:GitHub
2594:
2442:7 July
2411:
2401:
2298:GitHub
2252:
2242:
2127:Gigaom
1764:3.5.2
1745:3.4.3
1726:3.3.3
1707:3.2.4
1690:3.1.3
1673:3.0.3
1656:2.4.8
1639:2.3.4
1622:2.2.3
1605:2.1.3
1588:2.0.2
1571:1.6.3
1554:1.5.2
1537:1.4.1
1520:1.3.1
1503:1.2.2
1486:1.1.1
1469:1.0.2
1452:0.9.2
1435:0.8.1
1418:0.7.3
1401:0.6.2
1384:0.5.2
1300:Pregel
1281:GraphX
1234:, and
1152:scales
1109:, and
1103:ZeroMQ
979:option
958:option
943:format
856:apache
847:import
824:Python
802:Spark
714:sortBy
663:tokens
615:tokens
509:) and
465:filter
433:Python
363:and a
316:reduce
261:AMPLab
239:is an
170:Python
4467:Dryad
4432:Boost
4153:Array
4143:Fiber
4057:(CMT)
4030:(SMT)
3944:GPGPU
3797:Sqoop
3792:Slide
3787:Shale
3782:River
3772:MXNet
3727:Click
3712:AxKit
3700:Attic
3691:Log4j
3676:Batik
3639:Jelly
3602:Yetus
3588:Xalan
3503:Storm
3498:Spark
3488:Sling
3483:SINGA
3478:Shiro
3473:Samza
3453:Pivot
3448:Pinot
3393:Oozie
3388:OFBiz
3383:NuttX
3378:Nutch
3343:Maven
3328:Kylin
3318:Kafka
3303:James
3273:Helix
3268:HBase
3233:Flume
3228:Flink
3218:Felix
3208:Druid
3203:Drill
3193:Derby
3143:Camel
3118:Axis2
3093:Arrow
3088:Aries
2649:(PDF)
2409:S2CID
2250:S2CID
2093:arXiv
2089:(PDF)
2069:(PDF)
1920:(PDF)
1857:Notes
1298:): a
1252:(PCA)
1238:(LDA)
1128:cloud
1095:Flume
1091:Kafka
1087:Flink
1083:Storm
1048:count
931:spark
895:spark
862:spark
816:Scala
750:=>
723:=>
639:split
469:joins
453:Julia
437:Scala
219:spark
158:Scala
148:Linux
144:macOS
128:Scala
88:Scala
4532:ROCm
4462:CUDA
4452:Cilk
4419:APIs
4379:COMA
4374:NUMA
4305:MIMD
4300:MISD
4277:SIMD
4272:SISD
4000:Loop
3990:Data
3985:Task
3812:Wave
3752:Hama
3742:Etch
3707:Apex
3624:BCEL
3573:UIMA
3548:Tika
3493:Solr
3458:Qpid
3368:NiFi
3348:MINA
3323:Kudu
3308:Jena
3278:Hive
3258:Gump
3223:Flex
3123:Beam
3113:Axis
3108:Avro
2944:2022
2785:2014
2737:2014
2629:2016
2592:ISBN
2571:2016
2520:2014
2444:2016
2399:ISBN
2368:2016
2240:ISBN
2140:2016
2013:"12"
1701:3.2
1684:3.1
1667:3.0
1633:2.3
1616:2.2
1599:2.1
1582:2.0
1565:1.6
1548:1.5
1531:1.4
1514:1.3
1497:1.2
1480:1.1
1463:1.0
1446:0.9
1429:0.8
1412:0.7
1395:0.6
1378:0.5
1258:and
1187:and
1000:load
937:read
840:JDBC
836:ODBC
834:and
828:.NET
820:Java
621:data
585:data
573:conf
525:conf
477:lazy
443:and
441:.NET
429:Java
405:Kudu
251:and
223:.org
189:Type
162:Java
4547:ZPL
4542:TBB
4537:UPC
4517:PVM
4487:MPI
4442:HPX
4369:UMA
3970:Bit
3817:XML
3777:ODE
3686:Ivy
3681:FOP
3629:BSF
3443:Pig
3438:POI
3418:ORC
3188:CXF
3103:APR
3083:Ant
2391:doi
2232:doi
1757:3.5
1738:3.4
1719:3.3
1021:val
970:url
922:val
910:().
892:val
880:url
877:val
868:sql
850:org
826:or
804:SQL
780:top
777:)).
741:map
684:)).
669:map
654:val
612:val
582:val
564:new
555:val
537:().
531:new
522:val
507:map
499:map
425:I/O
375:or
312:map
259:'s
166:SQL
4639::
2974:.
2935:.
2917:.
2899:.
2881:.
2863:.
2845:.
2841:.
2823:.
2805:.
2801:.
2775:.
2727:.
2703:.
2691:^
2679:.
2619:.
2600:.
2561:.
2536:.
2510:.
2485:.
2461:.
2430:.
2407:.
2397:.
2370:.
2354:.
2329:.
2295:.
2262:^
2248:.
2238:.
2204:.
2156:.
2124:.
2101:.
2040:.
2023:.
2019:.
2015:.
1990:.
1964:.
1947:.
1936:.
1901:^
1892:.
1341:.
1330:.
1272:,
1215:,
1211:,
1207:,
1203:,
1199:,
1195:,
1191::
1177:,
1173:,
1169:,
1130:.
1113:.
1105:,
1101:,
1097:,
1093:,
1051:()
1045:).
1030:df
1015:()
1006:df
1003:()
925:df
916:()
822:,
818:,
786:10
774:_1
762:_2
738:).
735:_2
672:((
648:))
591:sc
558:sc
501:,
439:,
435:,
431:,
415:.
407:,
403:,
399:,
395:,
391:,
387:,
383:,
371:,
326:.
182:F#
180:,
178:C#
176:,
172:,
168:,
164:,
160:,
146:,
142:,
3887:e
3880:t
3873:v
3029:e
3022:t
3015:v
2978:.
2960:.
2946:.
2921:.
2903:.
2885:.
2867:.
2827:.
2787:.
2739:.
2713:.
2631:.
2573:.
2546:.
2522:.
2495:.
2471:.
2446:.
2415:.
2393::
2339:.
2315:.
2256:.
2234::
2215:.
2167:.
2142:.
2109:.
2105::
2095::
2051:.
2000:.
1975:.
1039:(
1033:.
1027:=
1009:.
997:.
994:)
988:,
982:(
976:.
973:)
967:,
961:(
955:.
952:)
946:(
940:.
934:.
928:=
904:.
898:=
883:=
871:.
865:.
859:.
853:.
838:/
789:)
783:(
771:.
768:x
765:,
759:.
756:x
753:(
747:x
744:(
732:.
729:s
726:-
720:s
717:(
711:.
702:)
699:_
696:+
693:_
690:(
681:1
678:,
675:_
666:.
660:=
642:(
636:.
633:_
630:(
624:.
618:=
606:)
600:(
594:.
588:=
576:)
570:(
561:=
549:)
543:(
528:=
459:/
445:R
174:R
99:)
69:)
Text is available under the Creative Commons Attribution-ShareAlike License. Additional terms may apply.