Storm

If you have any doubts in the below, contact us by dropping a mail to the Kung Fu Panda. We will get back to you very soon.

Basics

Alternatives to Storm

We can use custom written queues and workers to do realtime processing. Processes will put the data to be processed in queues, and workers will process the data. But it has many drawbacks

Advantages of using Storm

Storm Use Cases

Storm processes a feed of data coming into the system.

Nodes in a storm cluster

There are two types of nodes in storm cluster, master node and worker nodes. They are coordinated using ZooKeeper. The state of the master and worker nodes is always kept in zookeeper, and if any of the nodes is restarted, it can take its previous state and continue working.

Topology

Components of a Storm topology

Stream

Spout

Bolt

Stream Groupings

Types of Field Grouping

Example Topology

		// Start to create a topology
		
		TopologyBuilder builder = new TopologyBuilder(); 
		
		//create a spout with name "sentences" by class RandomSentenceSpout, 
		// and start 5 workers for it.	
		
		builder.setSpout("sentences", new RandomSentenceSpout(), 5);
		
		// set a bolt with name "split" and using SplitSentences class and gets data from "sentences" spout 
		// using "shuffle grouping", and start 8 workers for it.
		
		builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("sentences");
		
		//  sets a bolt with name "count" and class "WordCount", which gets data from "split" bolt, 
		//  and grouping is done on the basis of "word" field, and start 12 workers for it.
		
		builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
		
	

What is a Tuple in Storm

Lifecycle of a tuple

Storm Reliability

Storm Reliability, developer TODOs.

Removing message reliability

If we don't care about whether the message was successfully processed or not, we can remove the message reliability in the following three ways

Examples/References