Socialife - The project●Social feed aggregation/recommendation apppreinstalled in all of the Sony devices(Available in PlayStore)●Client developed by Sony Japan●We develop and manage APIS that provide data to theclient●All feeds are processed and storaged in our platform●System analyze the data and recommend you otherfeeds●Expected at the end of 2013 around 1.000.000 newusers registered in the platform and 170.000 DAU●All servers are in AWS and the deployments andconfiguration management are handled by Chef.●Nexus and Jenkins are used for CI.
System stats Components EC2 – Production env(reserved instances): 43 – Custom API(Java) nodes with current DAU. On demand – Beanstalk instances for scale out – RabbitMQ – Staging env: 30 nodes (Reserved instances for ½ day) – Redis – 10 Load Balancers – MongoDB (Sharding) – 25 Security Groups – Splunk – 15 Key Pairs – Varnish – US east region – Apache ● S3 – Alfresco – 2 buckets VPCIAM – 7 Network ACLs VPC – Multi-Factor Authentication – 10 Elastic IPs Device(Virtual Token in – 1 VPC(2 in the future) smartphones) – 1 Customer Gateway – 1 Internet Gateway – 3 Groups – 1 Virtual Private – 6 Subnets Gateway – 18 Users – 5 Route Tables – 1 VPN Connection
Advantages● Our APIs are state-less so you can scale out very easily. Nodes are created by Chef.● Very easy to do performance testing using vertical scalability that EC2 provide you to increase the resources of the instances. Very quickly create nodes with more CPU, RAM or IO if you need.● Outage recovery plan handled with nodes snapshots (MongoDB) or Chef (other nodes stateless)● Good management of users through VMFA, IAM, keypairs, certificates and user credentials● Good security with ACLs and Security Groups● Good integration with Chef. Chef Bootstrap machines● Support rapid response and customized consulting for the project by Amazon.
Disadvantages● You must adapt to the size of the instances whose resources(CPU, RAM...) are predefined and not customizable● You have no control over the evolution of the products that your service depends● You dont have access to the logs of some instances (for example load balancers)● Danger engaging AWS services and consequent difficulty migrating to another DC.
Recommendations● Strongly recommended run servers in more than one availability zone for avoid a total downtime in case of outage● Analyze performance tests for choose the minimum number of nodes that will be running 24 * 7 and sizes to reserve instances. Reserved instances reduce the cost to 2/3.● Advisable to use a large number of small servers instances close to 100% CPU usage, instead of having few powerful machines with their resources wasted, and launch new nodes and balancing requests among them when load increase.● Pre balancers warming● Request to support increasing the initial limitations of instances that can run on a simultaneous EC2 (20)● For certain services swings use TCP instead of HTTP. The balancing of requests to different nodes of our APIs by TCP internally solved some problems with HTTP requests without closing sessions. We only use HTTP balancing for requests that come to the public Apache.● Use Cloudformation to create network resources