Hive – connecting from SQL Developer

In my impression, the big development that takes place now in the world of Big Data is the creation of connectors. Such connectors enable us to continue using standard tools (R for example) with the data being stored in Hadoop. I am very much impressed with Hive. Hive allows us to access data being stored… Read More »

R – the shortest name possible

For some reason, short names are popular as computer languages. Think of “C”. Another example is “R”. R reminds me a bit of Matlab; it is an easy to learn language with immense statistical possibilities. It is compared to nowadays giants like SAS. The advantage of R is that it is widely accepted by the… Read More »

Hive, SQL on Hadoop

In a previous post, I discussed the difficulty to use Hadoop with its Big Data structure. One must write two different Java programmes. One programme is a so-called mapping programme; another is the reduce programme.

Pig: yet another approach to handling big data

In another post, I discussed how Java can be used to analyse data in a Big Data environment. The problem then lies with Java itsself. Java is not a tool for the faint hearted; it is difficult. Moreover, one must comply with a structure where one must write two programme’s: a mapping programme and a… Read More »

Python: another language to access Big Data

In an earlier post, I showed how Java could be used to access Big Data. I also stated that I had many problems with Java itsself. I noted that I was not the only one to have issues with Java. A much easier language is Python. This language is really easy to learn and it… Read More »

Hadoop: my first java programme

Today, I created a Java programme to get myself acquainted with the usage of Hadoop. I took an existing java programme to start with. This existing programme can be found at ” https://github.com/tomwhite/hadoop-book/blob/master/ch02/src/main/java/OldMaxTemperature.java “. I tweaked this programme to adjust it to my existing situation.

A nice utility to investigate files on Unix

Today, I worked with the Unix’ awk utility. This is an extremely potent utility to investigate text files on a Unix platform. It can be invoked from the terminal command line. The command must start with awk. The keyword awk is followed by a script that is positioned between quotes. After the quotes, the textfile… Read More »

Hadoop

Everyone talks about big data and Hadoop. Someone even compared it to teenage sex: everone talks about it, everyone knows someone who does it but no-one yet does it. I just tried hadoop to see what it is all about. I made two attempts to install hadoop. One attempt was about installing Hadoop 1.0.3. I… Read More »

Slowly Changing Dimensions Type 2

Just to get myself acquainted with the new Informatica version, I created a mapping in which SCD 2 was inplemented. The mapping is shown here. In the first step, the input data are read.Let us assume that these records are read. The records contain a number and a name: number Name 1 Tom 2 ine… Read More »

Use Case, Business Events and Time Events

In a previous post, I showed the context diagramme. I then continued by saying that each of the arrows that flow to and fro the bubble in the middle can be translated as use cases. But one may take a slightly different view: each of these arrows are either business events or time events. A business… Read More »