Create a Hive table – 2 ways

In this little note, I want to show two different ways to create a table on Hive. The first one starts with a file on HDFS that is available and we create a table upon this file. This table is defined as an external file that is exposed as a table. The code to be executed… Read More »

With Python in Hive

In this small note, it is described how an HDFS file can be stored in a Hive context. In it stored in a Hive context, it can be accessed from outside via ODBC. It is also possible to access the data as a SQL compliant database. The idea is that an abstraction is created on… Read More »

Oracle ODI

The successor to OWB is the Oracle Data Integrator. This tool has more functionalities than OWB. Next to that, it has an interface that more or less steers the user through a series of steps. The idea is that one starts with a technical view where the file locations, databases and schemes are declared. Once… Read More »

Dataflow in Oracle Warehouse Builder

I know that Oracle Warehouse Builder (OWB) is at end of life. On the other hand, I encounter OWB quite often and it is interesting to see how it works. So investigate how it works, I created a dataflow. It it a trivial one: it consists of a file that must be read into Oracle.… Read More »

Docker container

Only this weekend I downloaded a Docker package from This package allows you to run very small light weight containers on your server than act as components to perform a certain task. In a way, it looks like a virtual machine. It has no direct connect connection to the host machine and it runs… Read More »

reading an HDFS file in Python

In this note, I show you how to get data from an HDFS platform into a Python programme. The idea is that we have data on HDFS and we would like to use these data in a Python programme. So, we must connect to HDFS from within a Python programme, read the data , transform… Read More »

Putting a file on HDFS

Putting a file on HDFS is relatively easy. There are a few steps to take. Let us assume the file is on a linux system. The first step is to copy the file to an area where it can be stored with the hdfs user as its owner. On my system, I have /tmp that… Read More »

Estimating with Python

It is relatively easy to do an estimate with a Python script. This is due to the fact that Python works with matrices and such matrices can be used as an input in a estimation procedure. I created an example where a dataset is retrieve from Oracle. Then the dataset is translated into a matrix.… Read More »

Read and write by Python

Python seems to be a very convenient way to transfer data to and fro Oracle. It has capabilities to set up a connection and it seems quite capable to transfer a matrix into a table and vice versa. Next code shows this. It first retrieves the content of a table. In a second step some… Read More »

Python in a map reduce environment

I have written a very small python programme that follows the mapper / reducer sequence. This works as a replacement of a more complicated set of Java programmes that might be created to generate a mapper / reducer sequence. The idea is relatively simple. We create a stream from an input file. That stream is… Read More »