Last week in Stream Processing & Analytics 10/3/2016

This is the 34th installment of my blog series around Stream Processing and Analytics.

As usual, find below the new blog articles, presentations, videos and software releases from last week:

News and Blog Posts

General

Comparison

Spark Streaming

Apache Kafka

Apache Flink

Concord

Oracle Stream Analytics

StreamSets

Apache NiFi / Hortonworks Data Flow (HDF)

New Presentations

New Videos

Upcoming Events

Please let me know if that is of interest. Please tweet your projects, blog posts, and meetups to @gschmutz to get them listed in next week’s edition!

Advertisements

Providing Oracle Stream Analytics 12c environment using Docker

The past 2 days I spent some time to upgrade the docker support I have created for Oracle Stream Explorer to work for Oracle Stream Analytics (which is the new Oracle Stream Explorer).

I guess Docker I don’t have to present anymore, it’s so common today!

Preparation

You can find the corresponding docker project on my GitHub: https://github.com/gschmutz/dockerfiles

Due to the Oracle licensing agreement, the Oracle software itself can not be provided in the GitHub project. Therefore it’s also not possible to upload a built image to Docker Hub.

So you first have to download the Java 8 SDK as well as Stream Analytics Runtime using your own OTN login. Download the following 2 artifacts into the oracle-stream-analytics/dockerfiles/12.2.1/downloads folder.

Building the Oracle Stream Analytics Docker Install image

Navigate to the dockerfiles folder and run the buildDockerImage.sh script as root

$ sh buildDockerImage.sh -v 12.2.1 -A

This will take a while if run for the first time, as it downloads the oracle-linux base image first. At the end you should see a message similar to the one below:

  WebLogic Docker Image for 'standalone' version 12.2.1 is ready to be extended: 
    
    --> gschmutz/oracle-osa:12.2.1-standalone

  Build completed in 171 seconds.

It indicates that the OSA base docker image has been built successfully.

Be aware: this image is not yet executable, it only contains the software without any domain.

Building a Oracle Stream Analytics Standalone domain

In order to use Oracle Stream Analytics, we have to build a domain. This can be done using Docker as well, extending the Oracle Stream Analytics image created above and creating an OSA domain. Currently there is one sample Dockerfile available in the samples folder which creates an Oracle Stream Analytics Standalone domain. In the future this will be enhanced with a domain connecting to Spark.

To build the 12.2.1 standalone domain, navigate to folder samples/1221-domain and run the following command (use the OSA_PASSWORD parameter to specify the OSA user password):

$ docker build -t 1221-domain --build-arg OSA_PASSWORD=<define> .

There are other build arguments you can use to overwrite the default values of the Oracle Stream Analytics Standalone domain. They are documented in the GitHub project here.

Verify you now have this image in place with:

$ docker images

Running Oracle Stream Analytics server

To start the Oracle Stream Analytics server, you can simply call docker run -d 1221-domain command. The sample Dockerfile defines startwlevs.sh as the default CMD.

$ docker run -d --name=osa -p 9002:9002 1221-domain

Check the log by entering

$ docker logs -f osa

After a couple of seconds, the OSA server should be up and running and you can access the Oracle Stream Analytics Web Console at http://localhost:9002/sx.

Connect with user osaadmin and the password you specified above.

Oracle Stream Analytics (OSA): the new Oracle Stream Explorer

A few days ago, Oracle released the new version of Oracle Stream Explorer and renamed it to Oracle Stream Analytics (OSA). This new version is an impressive release with over 15 new major features! It really deserves the name change.

Enhanced Patterns Library

The existing patterns have been enhanced substantially  now including Spatial, Statistical, General industry and Anomaly detection through streaming machine learning.

patterns

New Geo-spatial pattern

This pattern can be used to analyze streams containing geo-location data and determine how events relate to pre-defined geo-fences in your maps.

map.png

Integrated Expression Builder

The Expression Builder allows to add calculated/derived fields on the Live Output Stream of an exploration, an important step towards the “streaming Excel sheet” idea of Oracle Stream Analytics.

expression.png

It provides the ability to apply and insert mathematical and statistical calculations into the active live output stream. Once a new expression has been defined and validated, a column will be added next to the column of relevance. This new column can then be used in subsequent filters and explorations.

Support for Business Rules in Explorations

The Business Rules section of the Stream Analytics canvas provides the ability to apply the more traditional IF-THEN-ELSE constraints and clauses on the various properties of the event shape.

business-rule.png

This capability enables the user to combine both streaming query analytics using temporal criteria together with a collection of business rules that can randomly effect the information in existing or new columns.

New streaming end point connections/targets

Oracle Stream Analytics supports new Event Stream sources and targets, such as MQTT, Apache Kafka and Twitter.

connection

Especially Kafka gets more and more important in modern Big Data architectures so I’m really pleased to see it available now.

We can now use Oracle GoldenGate for immediately capturing changes on any database table (CDC = change data capture), send these captured change events into Kafka using  GoldenGate for BigData  and consume it from OSA to apply streaming analytics on it.

Scaling-Out with Spark Streaming

An OEP server is no longer the only runtime option. With Oracle Stream Analytics you can deploy and execute streaming applications to a Spark Streaming infrastructure.

The figure below shows how you can select one of the two possible runtime environments (Spark grayed-out because not yet configured on my environment).

spark

Better Insights with Catalog Topology Viewer and Navigation

Topology is a graphical representation of the connected entities. The topology illustrates the dependencies and connections between the entities. The Topology Viewer helps in identifying the dependencies that a selected entity has on other entities. Understanding the dependencies helps you in being cautious while deleting or undeploying an entity.

topology

 

I’m really pleased with this new release and I’m looking forward to see more enhancements and improvements in future releases. As already mentioned, the product really deserves the name change, but I also hope it’s the last one for the next couple of years ;-). Oracle Stream Analytics simplifies stream processing and will enable Self Service Streaming Analytics applications for business people.

Find more information on Oracle Stream Analytics in the Documentation available here.

Stay tuned for an update on the Docker support I already had for Stream Explorer. I’m currently in the progress of updating it for Oracle Stream Analytics so you can quickly setup your own playground environment.