Key points - Many nodes, each node running carbon-relay, webapp, carbon-cache(s). - Use at least two carbon-cache processes at the node to utilize performance (typical y one process per CPU core) - All carbon-cache instances use the same schema definitions for whisper files - All monitoring agents (statsd/sensu/gdash/codehale/col ectd/etc) use loadbalancer front-end (HAproxy or ELB) to send/query metrics. - Each carbon-relay may route metrics to any carbon-cache instance at any graphite server in cluster. - All carbon-relays use 'consistent-hashing' method and have exactly the same DESTINATIONS list (carbon.conf DESTINATIONS. Order is important?) - All webapp processes share exactly the same memcache instance(s) (local_settings.py MEMCACHE_HOSTS) - Each webapp can may query only local carbon-cache instances. (local_settings.py CARBONLINK_HOSTS) - All webapps may contain not only other webapps in CLUSTER_SERVERS , but also itself. (local_settings.py, as of 0.9.10 version) - Each webapp CARBONLINK_HOSTS must contain only local instances from DESTINATIONS (order is not important ) - In terms of AWS EC2, graphite nodes are supposed to be instal ed in the same Region. - Aggregator is not that useful. It is better to aggregate somewhere else (statsd/diamond) and send to graphite.
webapp/graphite/storage.py STORE = Store(settings.DATA_DIRS, remote_hosts=settings.CLUSTER_SERVERS) class Store: def __init__(self, directories=, remote_hosts=): self.directories = directories self.remote_hosts = remote_hosts self.remote_stores = [ RemoteStore(host) for host in remote_hosts if not is_local_interface(host) ] ... def find_first(): ... remote_requests = [ r.find(query) for r in self.remote_stores if r.available ] ... It is safe to have exactly the same CLUSTER_SERVERS option for al webapps in a cluster (less template work with Chef/Puppet) Though, there are some edge-cases. https://github.com/graphite-project/graphite-web/issues/222
CARBONLINK_HOSTS should contain only local carbon-cache instances, not all DESTINATIONS list. Webapp wil take care of selecting proper carbon-cache instance for the metric, although it has a different list of items in his hash ring . webapp/graphite/render/datalib.py # Data retrieval API def fetchData(requestContext, pathExpr): ... if requestContext['localOnly']: store = LOCAL_STORE else: store = STORE for dbFile in store.find(pathExpr): log.metric_access(dbFile.metric_path) dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime) ) try: cachedResults = CarbonLink.query(dbFile.real_metric) results = mergeResults(dbResults, cachedResults) except: log.exception() results = dbResults if not results: continue ... return seriesList https://answers.launchpad.net/graphite/+question/228472