Pig revisited

Recently, I revisited Pig. Pig is a language that allows you to analyse data sets in a Hadoop environment. It is created at Yahoo to circumvent the technicalities of creating a MapReduce Java job. Yahoo claims that most of her queries on a Hadoop platform can be replaced by a Pig script. As Pig is… Read More »

Oops how much tablespace is left?

A few days ago, I was asked to load some tables in Oracle. A rather trivial question but I wasn’t sure if enough tablespace was left. From the table definition, I came to know what tablespace was used. After that I ran below query to see how much tablespace was actually left. I want to… Read More »


Sqoop is a tool that allows you to ship data from a RDBMS to a Hadoop platform. Let us take an example to clarify this. One may have some data in a MySQL table persons, within database thom. This database is stored on server The data can be accessed with the knowledge of the… Read More »

pushing files via Netcat

Netcat is a utility in unix to investigate network connections. It has now been ported to windows and it allows us to query network connections on a windows platform with netcat (nc). A nice possibility is to push files via nc from one machine to another. Assume for the moment that both machines have netcat… Read More »


Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then… Read More »


I encountered the term “serialise”. But what does it mean? I understood the term “serialise” when I read a comment that explained that data structures can be created inside, say, PHP. One may think of an object or an array. Such data structures can only be used inside PHP and they cannot be transported outside… Read More »

Avro in Java

Another example shows a similar idea. In this example a stream is created. This stream consists of 3 objects that contain a name and a number. Once the stream is created, it is serialised. In other words: the stream is prepared to be stored. It is stored in a file that is called “test.avro”. Before… Read More »

Sending data via AVRO

I got a better understanding when I used AVRO to write data via PHP and to read them via Java. It demonstrated to me how data can be written in one language and subsequently be read in another language. I use a file to have the data be written by PHP. Subsequently the data can… Read More »

Avro – getting it work

When you read about Hadoop, you come across AVRO. This is a mechanism to exchange data via streams and it is named after the famous British aircraft industry that amongst many other types, delivered the Lancaster that helped to liberate Europe. AVRO can be implemented in many languages, amongst them PHP. Before continuing let us… Read More »

How to find a file on Linux

There are zillions of small little Linux commands that make life easy. Those tiny little commands may save you hours of time and tons of frustration. Wheneever I look as the terminal of one of my colleagues, I discover yet another trick that they use to make life easy. May I present a little command… Read More »