ABONAMENTE VIDEO REDACȚIA
RO
EN
Numărul 154 Numărul 153
NOU
Numărul 152
Numărul 151 Numărul 150 Numărul 149 Numărul 148 Numărul 147 Numărul 146 Numărul 145 Numărul 144 Numărul 143 Numărul 142 Numărul 141 Numărul 140 Numărul 139 Numărul 138 Numărul 137 Numărul 136 Numărul 135 Numărul 134 Numărul 133 Numărul 132 Numărul 131 Numărul 130 Numărul 129 Numărul 128 Numărul 127 Numărul 126 Numărul 125 Numărul 124 Numărul 123 Numărul 122 Numărul 121 Numărul 120 Numărul 119 Numărul 118 Numărul 117 Numărul 116 Numărul 115 Numărul 114 Numărul 113 Numărul 112 Numărul 111 Numărul 110 Numărul 109 Numărul 108 Numărul 107 Numărul 106 Numărul 105 Numărul 104 Numărul 103 Numărul 102 Numărul 101 Numărul 100 Numărul 99 Numărul 98 Numărul 97 Numărul 96 Numărul 95 Numărul 94 Numărul 93 Numărul 92 Numărul 91 Numărul 90 Numărul 89 Numărul 88 Numărul 87 Numărul 86 Numărul 85 Numărul 84 Numărul 83 Numărul 82 Numărul 81 Numărul 80 Numărul 79 Numărul 78 Numărul 77 Numărul 76 Numărul 75 Numărul 74 Numărul 73 Numărul 72 Numărul 71 Numărul 70 Numărul 69 Numărul 68 Numărul 67 Numărul 66 Numărul 65 Numărul 64 Numărul 63 Numărul 62 Numărul 61 Numărul 60 Numărul 59 Numărul 58 Numărul 57 Numărul 56 Numărul 55 Numărul 54 Numărul 53 Numărul 52 Numărul 51 Numărul 50 Numărul 49 Numărul 48 Numărul 47 Numărul 46 Numărul 45 Numărul 44 Numărul 43 Numărul 42 Numărul 41 Numărul 40 Numărul 39 Numărul 38 Numărul 37 Numărul 36 Numărul 35 Numărul 34 Numărul 33 Numărul 32 Numărul 31 Numărul 30 Numărul 29 Numărul 28 Numărul 27 Numărul 26 Numărul 25 Numărul 24 Numărul 23 Numărul 22 Numărul 21 Numărul 20 Numărul 19 Numărul 18 Numărul 17 Numărul 16 Numărul 15 Numărul 14 Numărul 13 Numărul 12 Numărul 11 Numărul 10 Numărul 9 Numărul 8 Numărul 7 Numărul 6 Numărul 5 Numărul 4 Numărul 3 Numărul 2 Numărul 1
×
▼ LISTĂ EDIȚII ▼
Numărul 21
Abonamente

Tick Tock on Beanstalkd Message Queues

Tudor Mărghidanu
Software Architect
@Yardi România



PROGRAMARE

Time is generally a pretty restrictive dimension; all the more so in IT, where every product, regardless of its stage of development, is submitted to this measurement. Moreover, IT developers have divided time into different categories and the resources that are allocated to each project are mainly targeted at increasing time efficiency concerning the development of the product. In this article, I will only talk about the time allocated for the execution of one application during a given session.

I"m sure many of you are familiar with the notion of message queues, especially if you"ve had the opportunity to work with applications that work with asynchronous operations. These message queues offer a series of undeniable advantages, such as:

  • Decoupling - The separation of the application"s logic
  • Scalability - More clients can process data at the same time
  • Redundancy - The errors are not lost and you can start re-processing them

There are more services that help implement these message queues, but in this article I will only talk about Beanstalkd.

Beanstalkd

Beanstalkd is a service with a generic interface, which was developed in order to cut down on the latency period between the processes of an application that requires longer execution time. Thanks to its generic interface, this service represents a major scalability factor within the application that needs to be developed. Beanstalkd doesn"t require any implementation limit (language or marshalling) because it uses PUSH sockets for communication and has a very simple protocol. I will explain some of the more important terms in the Beanstalkd terminology so that you get a better view on what the rest of the article will focus on:

  • Job: It represents the message itself, serialized according to personal criteria or according to the user library that is being used.
  • Tube: The Namespace that is used for the queue. Beanstalkd accepts more queues at the same time.
  • Producer: The process that deals with queueing the messages on the tubes.
  • Consumer: The processes that use the messages which were queued on one or more tubes.
  • Operations: The Producer or the Consumer can perform the following operations with the jobs posted on the tubes.
    • put
    • reserve
    • delete
    • release
    • bury

Problem

My belief is that a good learning method is one that offers examples, so I thought about a problem for this article:

"Build a web application where the users may upload video files in various formats, so that they would be available for display on one of the application"s pages."

The importance of this statement comes from the fact that it is vague enough to leave room for the scalability of the problem, which in turn triggers the scalability of the solution; but its beauty lies in fact in Beanstalkd"s simplicity.

Once the client is implemented, its execution can be scaled both vertically (more system processes on the same machine) and horizontally (more system processes on different machines), using the same initial rule.

The figure illustrates the data flow within the application and the way in which the web application should interact with the users, by using a layer of shared storage.

Think about a situation where users upload a set of video files on a given page, videos that enter a pre-defined process that fulfills two major functions: the first one deals with storing the file in a pre-defined persistence layer (distributed file system or database); the second function prepares and writes a message that contains information pointing you to the reference in the persistence layer. From this point on, the operation becomes an asynchronous and distributed one; if the ratio between the number of consumers and the frequency of input data was correctly determined, the files that were uploaded should be processed in a short time.

package MyApp::Globals;
# ... More static properties ...
use JSON::XS;
use Beanstalk::Client;
class_has "message_queue" => (
	is => "ro",
	isa => "Beanstalk::Client",
	default => sub {
		return Beanstalk::Client->new(
{
# NOTE: This usually should come from a configuration
file...
	server => "localhost",
# Making sure we serialize/deserialize via JSON.
encoder => sub { encode_json( shift() ); },
decoder => sub { decode_json( shift() ); },
		}
	);
}
);
package MyApp::Web::Controllers::Videos;
# ...
sub upload {
	my $self = shift();
	# Retrieving the uploaded video.
	my $video = $self->req()->upload( "video" );
# Additional content and headers validation ...
# Storing the video in the persistance layer ...
my $object = MyApp::Globals->context()
	->dfs()->raw_videos(
		{
			filename => $video->filename(),
			headers => $video->headers()->to_hash(),
			data => $video->slurp(),
			# ... additional user data
		}
	);
# Making sure we use the right tube for sending the #
data.
MyApp::Globals->message_queue()
	->use( "raw_videos" );
# Storing the data in the queue...
MyApp::Globals->message_queue()
	->put(
{
priority => 10000,
data => $object->pack(),
	# Serialization occurs automatically ...
	}
);
}

The consumers work as fast as they can, requesting messages from Beanstalkd as they process the data. At this point, we change the status of the message as we go along. In this way, we can track the number of times the program was run correctly and also the number of mistakes we found. If we encounter an error, we can change the status of the messages that we marked as wrong once the problem was solved.

Another important aspect is that the parallel connection of the consumers can be achieved through system processes, which leads to a considerable ease of management and to the elimination of resource locking and memory leaks.

# Getting messages only from these tubes ...
MyApp::Globals->message_queue()
->watch_only( "raw_videos" );

while( 1 ) {
# Retrieving a job from the message queue and
# marking it as reserved...
my $job = MyApp::Globals->message_queue()
->reserve();

eval {
my $data = $job->args();
# Automatic data deserialization ...
# Doing the magic on the data here ...
};

# In case of an error we signal the error in
# back-end and budy the job.
if( my $error = $@ ) {
	$logger->log_error( $error );
	$job->bury();
} else {
	$job->delete();
	# If everything is ok we simply delete the job
	# from the tube!
	}
}

Conclusions

On a more personal note, I"ve always liked simple and elegant solutions that involve a minimum set of rules and simple terminology. Beanstalkd is a perfect example of this. But it is also important to note that the introduction of this service represents, to a certain degree, an integrating effort and no one should try to re-invent the wheel at this point in the development of the application.

Another vital aspect is the fact that, using a distributed system in this manner allows for both a compressing/dilation of time and a very obvious fragmentation of the execution process. Therefore, a process which, running sequentially, could take a few weeks to complete may be reduced to a few days or even a few hours, depending on the duration of the basic process.

Pros

  • Speed
  • Persistence
  • It doesn"t require any serialized model

Cons

  • The distributed mode is only supported in the client
  • Lack of a security model

LANSAREA NUMĂRULUI 154

AI vs. Tradițional, alternative

Marți, 29 Aprilie, ora 18:00

sediul GlobalLogic

Facebook Meetup StreamEvent YouTube

NUMĂRUL 153 - Generative AI

Sponsori

  • BT Code Crafters
  • Bosch
  • Betfair
  • MHP
  • BoatyardX
  • .msg systems
  • P3 group
  • Ing Hubs
  • Cognizant Softvision
  • GlobalLogic
  • Colors in projects