このページは http://www.slideshare.net/Rocketfuelinc/hadoops-or-had-oops の内容を掲載しています。
Predicting the most relevant ad at any point in time for every individual is how Rocket Fuel opti...
Predicting the most relevant ad at any point in time for every individual is how Rocket Fuel optimizes ROI for an advertiser. One of the factors influencing this prediction is a consumer’s online interactions and behavioral profile. With more than 45 billion interactions being processed daily, this data runs into several Petabytes in our Hadoop warehouse. Running machine-learning algorithms and Artificial Intelligence on this vast scale requires many practical issues to be addressed. First, behavioral patterns are shortlived, so to accurately reflect the tendencies of a consumer, we need to curate and refresh his or her profiles as quickly as possible while avoiding multiple scans over the raw data and dealing with issues like transient system outages. Second, we must address the difficulty of building models utilizing behavioral profiles without overwhelming our Hadoop cluster. At this scale, frequent refreshes of several models can place an undue burden on even a thousand-node cluster. In this talk, we will dive into (a) the practical challenges involved in designing a highly scalable and efficient solution to build behavioral profiles using Hadoop framework and (b) techniques for ensuring reliability and availability of mission critical machine learning pipelines.