Criticism: “Apples to Oranges” 16 Cores 20 Cores 6
Most SAS/STAT PROCs (including PROC GENMOD) run single-threaded. SAS/STAT: 91 PROCs • 69 single threaded • 13 multi-threaded • 9 distributed (if you license SAS HP Statistics) 7
2013: SAS Benchmark PROC HPGENSELECT – SAS/STAT – SAS High Performance Statistics Massive grid (140/144 nodes) – 16 cores per node – 2,240/2,304 cores Conclusion: SAS on 2,304 cores is competitive with RRE on 20 cores. 9
Honest Benchmarking Compare RRE and SAS/STAT performance – Same data – Same environment – Same tasks Test under real-world conditions Make the test fair and transparent
Why is RRE faster than SAS? RRE supports scalable computing out of the box – Multi-threaded processing – Distributed processing Legacy SAS is mostly single-threaded – DATA Step processing – Most SAS/STAT PROCs
SAS HP PROCs 9 new SAS PROCs Bundled into SAS 9.4 Designed for scalability Multiple operating modes: – Single machine – Distributed (must license SAS HP Statistics) 19
HP PROCs: Minimal Improvement Linear regression, 20 predictors N=5,000,000 253.82 267.17 6.8 0 50 100 150 200 250 300 Runtime, Seconds SAS: PROC HPREG SAS: PROC REG RRE: rxLinMod HPREG running in single machine mode. 20
Summary RRE is faster than Legacy SAS: – Same tasks – Same hardware RRE speed: – Efficient engineering – Multi-threaded and distributed processing SAS performance claims: – Massive hardware requirements – Force you to license more software from SAS – Don’t apply to Legacy SAS 21
Polling Question Which of the following analytic software benefits is most important to you: – A) Completing projects faster – B) Building better predictive models – C) High performance with low infrastructure costs
John Wallace, Founder & CEO 23
DataSong at a Glance Background Approaching $1 trillion in revenue analyzed. $3 billion in marketing spend under our lens. Experienced 60+ person team based in San Francisco with offices in Seattle, Los Angeles, Singapore, and India. Founded in 2003 with a proven history of solving difficult analytics problems. Evolved from consulting through close partnerships with our clients. Our Offerings Customer interaction insight that powers applications for customer-level revenue attribution, targeting, media optimization. Descriptive and predictive modeling of hidden trends and relationships in big data. Custom development including applications, process automation, and decision support solutions.
We know Big Data. We analyze and provide the “so what”.
DataSong Architecture • Functions to read Hadoop output; xdf creation DATASONG DATA CUSTOM VARIABLES FORMAT (DDF) • Exploratory data analysis (PMML) • GAM survival models • ETL • Scoring for inference • N marketing channels • Scoring for prediction • Behavioral variables • 5 billion scores per day • Promotional data per customer • Overlay data
Where Speed 3 key dimensions Matters Trade offs for speed ● how many rows ● Sampling variance ● how many variables ● Test fewers features ● how many iterations of a model ● Have less understanding of the signal
This 3rd dimension means we must multiply any benchmark by N