このページは http://www.slideshare.net/continuumio/python-scalability-a-convenient-truth の内容を掲載しています。
Python is the fastest-growing data science language and is used in production at many of the Fort...
Python is the fastest-growing data science language and is used in production at many of the Fortune 500 companies for everything from software engineering to data engineering to rapid analytics. Despite its easy-to-learn nature and its simple syntax, Python packs surprising amounts of power and performance right out of the box. For instance, many of the newest innovations in the big data ecosystem, such as columnar storage, dataflow programming, and stream processing, can all be expressed in a relatively straightforward manner using Python.
Unfortunately, it is also very easy to implement Python in ways that impede its ability to scale. For instance, many Hadoop practitioners fail to consider the implications of serialization overhead when interfacing with tools like R and Python. Others may simply be unaware of the facilities in Python to manage multicore and larger-than-memory workloads and assume that they have to move to complex distributed computing the instant they hit a memory barrier.
Travis Oliphant covers the basic concepts that lie at the heart of Python’s scalability and power and defuses myths about its performance limits. Travis looks at a few of the common antipatterns that tend to crop up as people integrate Python with Hadoop and Spark and take it into production-deployment environments. Travis will demonstrate examples of real-world code and scenarios where orders-of-magnitude performance improvement can be achieved by using better data-management techniques and artfully applying modern Python libraries for performance.