このページは http://www.slideshare.net/marktab/data-analysis-with-r-and-julia-201305 の内容を掲載しています。

掲載を希望されないスライド著者の方は、こちらよりご連絡下さい。

3年以上前 (2013/05/09)にアップロードinビジネス

R is a free, open-source environment for statistical analysis and graphing. In its almost 20 year...

R is a free, open-source environment for statistical analysis and graphing. In its almost 20 years of existence, R has remained popular in both academic and business environments. The newer Julia is a high-level, high-performance dynamic programming language for technical computing, with syntax that is familiar to users of other technical computing environments. This session outlines functional and performance differences between these two software packages. You’ll see demonstrations of best tips for integrating this software with Windows and walk away with guidelines for working with commercial software. A version of this presentation had 100 attendees at the PASS Business Analytics Conference in Chicago (April 2013), and 40 attendees for the PASS Virtual Business Analytics meeting (May 2013).

- Juliaのこれまでとこれから2年以上前 by Sorami Hisamoto
- Introducing Julia3年弱前 by Sorami Hisamoto
- Julia for R programmers2年以上前 by Naren Arya

- Data Analysis with R and Julia

Advanced Analytics and Insights

Mark Tabladillo Ph.D., Data Mining Scientist, MarkTab Inc. - Networking

Interactive - About MarkTab

Training and Consulting with

http://marktab.com

Data Mining Resources and Blog at

http://marktab.net

Twitter @marktabnet - Outline

R Language

Market Analysis

Performance

Production Use

Julia Language

Performance - The R Language

http://cran.r-project.org - Major R Versions

Version

Description

0

Initial release: University of Auckland, New Zealand

1996

1

Completeness and stability high enough to characterize a full statistical system, which could be put

2000

to production use

2

Strong enhancements of the memory management subsystem as well as several major features,

2004

including Sweave (into LaTeX or LyX).

3

The inclusion of long vectors (containing more than 2^31-1 elements!). Also, we now have 64 bit

2013

support on all platforms, support for parallel processing, the Matrix package

http://www.r-project.org/ - How R Works

As with an automobile, you can use R without worrying very much about how it

works.

But computing with data is more complicated than driving a car (fortunately for

highway safety)

John Chambers

Software for Data Analysis, page 453 - R works in a shel

Cross-platform, including Windows x32 or x64

Interactive graphical user interface (GUI) to interpret commands

Read – accept user input

Parse -- interpret input using expected syntax

Evaluate – execute commands

Everything is an object

Data are stored in data frames, named lists

R implements S language grammar, with a few extensions - R GUI
- Read-Parse-Evaluate Loop

Read

Evaluate

Parse - R Market Analysis
- Listserv Discussion

http://r4stats.com/articles/popularity/ - Estimated R Usage

Estimated 250,000 people use it regularly (as of 2009)

http://www.nytimes.com/2009/01/07/technology/business-

computing/07program.html?pagewanted=2&_r=0 - General Forum Postings

http://r4stats.com/articles/popularity/ - Stack Overflow Alone

http://r4stats.com/articles/popularity/ - Academic Publications

http://r4stats.com/articles/popularity/ - Comparison of R, Matlab, SAS, Stata,

SPSS

http://www.analyticbridge.com/group/productreviews2/forum/topics/produ

ct-reviews-comparing-r-matlab-sas-stata-spss - R Performance
- R is Memory-Bound

𝑀𝑒𝑚𝑜𝑟𝑦 𝑆𝑖𝑧𝑒 = 𝐴𝑚𝑜𝑢𝑛𝑡 𝑜𝑓 𝑅 𝐷𝑎𝑡𝑎

4

Source: Joseph B. Rickert, February 14, 2013

64𝑏𝑖𝑡 𝑀𝑒𝑚𝑜𝑟𝑦 𝑆𝑖𝑧𝑒 = 𝑅𝐴𝑀

32𝑏𝑖𝑡 𝑀𝑒𝑚𝑜𝑟𝑦 𝑆𝑖𝑧𝑒 = 𝑈𝑠𝑒𝑟 𝑉𝑖𝑟𝑡𝑢𝑎𝑙 𝑀𝑒𝑚𝑜𝑟𝑦 − 0.5𝐺𝐵 ≅ 2 𝐺𝐵

Source: http://cran.r-project.org/bin/windows/base/rw-FAQ.html retrieved March 1,

2013 - R is Memory-Bound

All objects in an R session are stored in memory

R places a limit of 231 − 1 bytes on all object sizes, independent of RAM

The Art of R Programming, Norman Matloff - R Memory Management

Automatic including garbage collection

rm()removes object assignment, but does not delete memory

gc() forces garbage collection with substantial computation - Improving Performance

Power

C/C++

Parallel R

Vectorization

Byte-Code Compilation

Simplicity

The Art of R Programming, Chapter 14, Norman Matloff - Improving Performance

Method

Description

C/C++

Call C programs from R

Vectorization

Recode for vectorization replacing slower functions

Byte-code compilation

cmpfun()

Parallel R

parallel package

http://cran.r-project.org/web/views/HighPerformanceComputing.html - R for Production Use
- Derivative Projects

RStudio – Integrated Development Environment (IDE)

Rattle – Data Mining Package

RExcel – (Statconn) Connection between R and Excel

Weka – Java-based data mining, statistical analysis by R

RapidMiner – Java-based Weka data mining, statistical analysis by R

Revolution Analytics – Scaling R for the Enterprise

Oracle R Enterprise – Integrated into Oracle - About Statconn (as of March 2013)

Produces RAndFriends under noncommercial and commercial licenses

All the statconn tools work ONLY with 32-bit R

statconnDCOM

rcom (GPL2, but requires statconnDCOM)

RExcel 3.2.9 (ONLY 32-bit Office: 2003, 2007, 2010)

http://rcom.univie.ac.at/ - Sample Projects Using R

The Heritage Health Prize, Thomas Nguyen

A Direct Marketing In-flight Forecasting System, Shannon Terry & Ben Ogorek

Mining Twitter for Airline Consumer Sentiment, Jeffrey Breen

Alternative Data Sources for Measuring Market Sentiment and Events (Using R), Joe

Rothermich - The Julia Language

http://julialang.org/ - About Julia

High-level, high-performance dynamic open-source programming language for technical

computing

Syntax similar to other technical computing environments

Features

Sophisticated compiler

Distributed parallel execution

Numerical accuracy

Extensive mathematical function library

Uses C, C++, Fortran libraries extensively - Why Julia: “Because we are greedy”

http://julialang.org/blog/2012/04/nyc-open-stats-meetup-announcement/ - Julia Community

Hosted on github

550 mailing list subscribers (Google Groups)

1,500 github followers

190 forks

50 total contributors

As of September 2012, all contributors except the core developers had known

of the language for six months or less

Julia: A Fast Dynamic Language for Technical Computing (2012), Beazanson, Karpinski,

Shah, Edelman - The Julia Manual

http://docs.julialang.org/en/latest/manual/ - Julia Mathematical Functions

http://docs.julialang.org/en/latest/manual/mathematical-operations/ - Julia Standard Library

http://docs.julialang.org/en/latest/stdlib/ - Julia Performance
- Key Ingredients of Julia Performance

Rich type information, provided naturally by multiple dispatch

Aggressive code specialization against run-time types

Julia’s LLVM-based just-in-time (JIT) compiler

Julia: A Fast Dynamic Language for Technical Computing (2012), Beazanson, Karpinski,

Shah, Edelman - Julia Performance Comparison

http://julialang.org/ - Julia Performance Comparison

Julia: A Fast Dynamic Language for Technical Computing (2012), Beazanson, Karpinski,

Shah, Edelman - Julia Recommendations

The software is ready for people already using C or Fortran

The software will develop into a usable scripting language for R users

Wait until version one for production use - Conclusion

R provides production-ready software for statistical analysis

Julia merits personal investment and promises high performance