Flume allows to directly tranfer messages into a file. It even allows such files to be stored on Hadoop. This opens a way to capture messages in a file that is stored on Hadoop, ready to be analysed. The example is a series of events from a log that are collected. The file is then… Read More »


I encountered the term “serialise”. But what does it mean? I understood the term “serialise” when I read a comment that explained that data structures can be created inside, say, PHP. One may think of an object or an array. Such data structures can only be used inside PHP and they cannot be transported outside… Read More »

Avro in Java

Another example shows a similar idea. In this example a stream is created. This stream consists of 3 objects that contain a name and a number. Once the stream is created, it is serialised. In other words: the stream is prepared to be stored. It is stored in a file that is called “test.avro”. Before… Read More »

Sending data via AVRO

I got a better understanding when I used AVRO to write data via PHP and to read them via Java. It demonstrated to me how data can be written in one language and subsequently be read in another language. I use a file to have the data be written by PHP. Subsequently the data can… Read More »

Avro – getting it work

When you read about Hadoop, you come across AVRO. This is a mechanism to exchange data via streams and it is named after the famous British aircraft industry that amongst many other types, delivered the Lancaster that helped to liberate Europe. AVRO can be implemented in many languages, amongst them PHP. Before continuing let us… Read More »

How to find a file on Linux

There are zillions of small little Linux commands that make life easy. Those tiny little commands may save you hours of time and tons of frustration. Wheneever I look as the terminal of one of my colleagues, I discover yet another trick that they use to make life easy. May I present a little command… Read More »

Add data on Big Data in Hive and Impala

This post provides info on how data may be added on a Big Data platform with help of Hive and Impala. We start with a dataset that is stored on a Linux platform. We will show how these data can be stored on a HDFS/ Hadoop platform. After that, we will show how these data… Read More »

Dropping a table in Oracle

To drop a table is straightforward in Oracle. One might simply issue a drop table statement. Let us assume we have table HH. When “drop table HH” is fired, the table is removed. However, an error is returned if the stable doesn’t exist. Again: if table HH doesn’t exist and a SQL “drop table HH”… Read More »

transpose a record in Oracle

Transpose a record in Oracle isn’t easy. I had a small table with several records and one value in a record. I wanted to transpose that table into one record with the values adjacent to each other. The question: how to accomplish this? Recently, Oracle introduced the pivot facility that allowed this procedure. The code… Read More »

SQL Loader

The SQL Loader is a facility that allows you to load data files blazingly fast. It is able to do as data files are directly written to disk without any overhead. It needs two files: a control file and a data file. The process generates a log file, that provides information on whether process has… Read More »