Stream Processing - Better Knowledge Faster
March 19, 2018

Stream Processing - Better Knowledge Faster

Jim Sharpe | TrustRadius Reviewer
Score 8 out of 10
Vetted Review
Verified User

Software Version

Streams (On-Premises Version)

Overall Satisfaction with IBM Streams

IBM Streams allows me to solve problems for my clients that would be difficult, impossible, or too expensive to do with other technologies. Most of the applications areas for which I have been applying it have been real-time in nature with a requirement for low end-to-end latency. Lately, a common use case has been to use Streams to ingest and transform incoming events into a form more suitable for storing for subsequent long-term analysis such as model training. For example, ingesting complex nested JSON documents and transforming and enriching that data into a flattened columnar format to be persisted in Spark, Event Store, Parquet files, etc. With this pattern, Streams is one element in an overall data processing pipeline where multiple technologies are optimally employed to do what they do best. Some of the features provided by the IBM Streams platform are particularly well suited for implementing dynamic microservices than can quickly be developed and deployed to provide valuable agility for evolving problem spaces.
  • IBM Streams is well suited for providing wire-speed real-time end-to-end processing with sub-millisecond latency.
  • Streams is amazingly computationally efficient. In other words, you can typically do much more processing with a given amount of hardware than other technologies. In a recent linear-road benchmark Streams based application was able to provide greater capability than the Hadoop-based implementation using 10x less hardware. So even when latency isn't critical, using Streams might still make sense for reducing operational cost.
  • Streams comes out of the box with a large and comprehensive set of tested and optimized toolkits. Leveraging these toolkits not only reduces the development time and cost but also helps reduce project risk by eliminating the need for custom code which likely has not seen as much time in test or production.
  • In addition to the out of the box toolkits, there is an active developer community contributing additional specialized packages.
  • Although there is support for developing Streams application in Python and Java as well as a visual programming interface. In order to get the absolute most out of the platform IMO it's still best to develop applications using proprietary SPL (Stream Programming Language). Although SPL is a very effective language for stream processing it does present a barrier to entry that will be avoided with the updated visual development tools which being worked on.
  • Historically Streams has allowed me to solve problems for clients that simply could not have been addressed using any other means. So the business benefit was actually being able to provide a solution to very challenging requirements. However, the relatively recent proliferation of stream processing platforms means there are now more options available that might meet the desired requirements.
  • IBM Streams was a critical component in a data science processing pipeline allowing us to identify a potential biomarker in EEG recordings indicating whether traumatic brain injury patients are at risk for developing post-traumatic epilepsy. This is important for identifying which patients should or should not be included in drug studies.
  • Another successful project employed Streams as part of a pipeline for detecting and classifying sources in underwater acoustic data. Compared to other methods the approach had very high simultaneous levels of both sensitivity and discrimination. It also had the significant benefit of being able to detect signals which had not previously been observed.
The selection of a stream processing platform depends heavily on the details of the requirements. There is no one right answer for all situations. However, IBM Streams typically has the advantage when sub-millisecond latency is important, complex analytics need to be performed on large volumes of data-in-motion, or where minimizing operational costs are important (i.e. doing more work on less hardware).
Streams is a good fit for situations requiring low end-to-end latency, have complex real-time analytical processing needs on large fast data, or where the reduction of operational costs is important. However, it is very much a data-in-motion technology and not well suited for situations such as some forms of machine learning where the entire historical data set needs to be operated on. Note that it's fairly common to use Streams to perform online scoring using models that were trained offline using other technologies.