MLaaS Adventures

Wednesday, 16 November 2016

RBox24.com is live!

I'd like to announce that rbox24.com is live in beta!

Recover Abandoned Shopping Carts!

System connects to shop's back-end and monitors shopping behavior. If it discovers that shopping cart was abandoned it sends emails with promotion code and recommended products.

Every shop has its own flavor and specifics. Machine learning algorithms are finding right ways to recommend most attractive products. Algorithms are learning customers' behavior. They are building behavioral profiles and finding similarities between them. Recommendations based on that are boosting sales.

Now Prestashop plugin is available. More plugins are in the roadmap.

Saturday, 9 April 2016

Live US Jobs Stream

Live Jobs

Take a look at my web page showing a live stream of jobs in North America.

Jobs Stream and Description

On the right side of the page there is a live stream of jobs found in Twitter messages. The log is refreshing automatically once new jobs are found.

See it all here.

Mechanics

Service is working using PubNub queue as source of messages. Python script subscribes to PubNub queue and listens for twitter messages. Once message comes it asks GEO Clustering service to find which cluster a message belongs to. If message belongs to North America cluster it asks text classification service if it contains a job offer. If there is a job offer in it the message it is published to output queue.

We can now subscribe to output queue consume job offers showing them on the web page. There can be many consumers subscribed to the queue.

Conclusions

ML engines talking to PubNub queues allow to build chains of services in very easy way.
Any kind of text classification is available at hand. It allows for many use cases related to finding information in streams of data.
Any kind of clustering is available at hand. GEO heat maps are very easy to implement.

An engine finding sentiment information could be very useful. Stay tuned. Adding it to the chain will give a lot of flexibility.

Tuesday, 1 March 2016

PredictionIO Recommender Evaluation Tutorial

This tutorial explains how to evaluate results generated by PredictionIO/template-scala-parallel-ecommercerecommendation. You can also see implementation details related to evaluations in PredictionIO framework.

Results

Results are Jaccard similarity and Precision@K values. If we repeat evaluation procedure we can see if our recommendation performance changes in time. Additionally evaluation loop helps by finding best algorithm parameters for our training data.

Resources

GitHub branch: https://github.com/goliasz/template-scala-parallel-ecommercerecommendation/tree/develpg
Tag: e0.7.0
PredictiomnIO official template page: http://templates.prediction.io/PredictionIO/template-scala-parallel-ecommercerecommendation

Step by Step Guide

I’m going to use Docker Image with PredictionIO to deploy recommendation engine and evaluate results. However there is no problem with testing evaluation algorithm in “standard” PredictionIO installation.

Let us run new docker container with preinstalled PredictionIO and Spark1.5.5. Before running container I’m going to create MyEngine folder which I will share with docker container as a volume. Recommendation engine will be deployed inside this folder and will be visible outside docker container. Even if I drop docker container my engine will stay untouched and I won’t loose my deployed PIO application.

I’m using Centos 6.7 box for my deployment. Let us create MyEngine folder inside my home directory.

mkdir MyEngine

Next let us run docker container with PredictionIO

docker run --hostname eval1 --name eval1 -it -v $HOME/MyEngine:/MyEngine goliasz/docker-predictionio /bin/bash

After running the docker container we are inside it with a bash root prompt.

root@eval1:/#

Now let us change folder to MyEngine. We are going to create PredictionIO engine inside.

root@eval1:/# cd MyEngine
root@eval1:/MyEngine#

Now let us create an recommendation engine. We have to pull right template from GitHub using “pio template get” command.

root@eval1:/MyEngine# pio template get goliasz/template-scala-parallel-ecommercerecommendation --version "e0.7.0" ecorec0

The command will create a new folder “ecorec0” which will contain recommendation engine implementation.

I’m giving following responses to the command. The responses are used to configure the engine. Scala sources inside will get package name according to my responses.

Please enter author's name: goliasz
Please enter the template's Scala package name (e.g. com.mycompany): com.kolibero
Please enter author's e-mail address: piotr.goliasz@kolibero.eu
Author's name: goliasz
Author's e-mail: piotr.goliasz@kolibero.eu
Author's organization: com.kolibero
Would you like to be informed about new bug fixes and security updates of this template? (Y/n) Y

Now we can start PredictionIO services.

root@eval1:/MyEngine# pio-start-all

After executing the command we can build the engine using “pio build --verbose”. First we have to change our current folder to new engine’s folder.

root@eval1:/MyEngine# cd ecorec0

Now we can build our engine.

root@eval1:/MyEngine/ecorec0# pio build --verbose

After the engine is built we have to register it in PIO framework.

root@eval1:/MyEngine/ecorec0# pio app new ecorec0

As a result we will get new application ID and Access Key. In my case application ID is 1. We will use Access Key to feed event store with test data.

Now we have to customize engine.json file with proper application name.

root@eval1:/MyEngine/ecorec0# vi engine.json

My engine.json after customization looks following. Changes are marked bold.

{
"id": "default",
"description": "Default settings",
"engineFactory": "com.kolibero.ECommerceRecommendationEngine",
"datasource": {
    "params" : {
      "appName": "ecorec0"
    }
},
"algorithms": [
    {
      "name": "ecomm",
      "params": {
        "appName": "ecorec0",
        "unseenOnly": false,
        "seenEvents": ["buy", "view"],
        "similarEvents": ["view"],
        "rank": 10,
        "numIterations" : 20,
        "lambda": 0.01,
        "seed": 3
      }
    }
]
}

At this point we need training data. Let us generate training data using python script delivered by template. We have to use Access Key to feed our event store with data.

root@eval1:/MyEngine/ecorec0# python data/import_eventserver.py --access_key VyhiNmp59j9qupci50M951IAqHsKVCvZXgMNhyn85crzbdaarYdz5OrnAY3JImxL

Note! Access key marked in bold is my value. You will have different value. You can check your application ID and access key using command “pio app list”.

At this point we have data in our event store and we are almost ready to start evaluation of the recommendation engine. The last this we need to do is a little customization of Evaluation.scala file and rebuilding the engine.

Let us open Evaluation.scala and change “INVALID_APP_NAME” into “ecorec0”. Changes have to be made at the end of the file.

root@eval1:/MyEngine/ecorec0# vi src/main/scala/Evaluation.scala

File before changes. Invalid strings marked in bold.

trait BaseEngineParamsList extends EngineParamsGenerator {
protected val baseEP = EngineParams(
    dataSourceParams = DataSourceParams(
      appName = "INVALID_APP_NAME",
      evalParams = Some(DataSourceEvalParams(kFold = 2, queryNum = 5, buyTestScore = 10.0, viewTestScore = 1.0))))
}

object EngineParamsList extends BaseEngineParamsList {
engineParamsList = for(
    rank <- Seq(10);
    numIterations <- Seq(20);
    lambda <- Seq(0.01))
    yield baseEP.copy(
      algorithmParamsList = Seq(
        ("ecomm", ECommAlgorithmParams("INVALID_APP_NAME", false, List("buy", "view"), List("view"), rank, numIterations, lambda, Option(3)))) )
}

File after changes. Changes in bold.

trait BaseEngineParamsList extends EngineParamsGenerator {
protected val baseEP = EngineParams(
    dataSourceParams = DataSourceParams(
      appName = "ecorec0",
      evalParams = Some(DataSourceEvalParams(kFold = 2, queryNum = 5, buyTestScore = 10.0, viewTestScore = 1.0))))
}

object EngineParamsList extends BaseEngineParamsList {
engineParamsList = for(
    rank <- Seq(10);
    numIterations <- Seq(20);
    lambda <- Seq(0.01))
    yield baseEP.copy(
      algorithmParamsList = Seq(
        ("ecomm", ECommAlgorithmParams("ecorec0", false, List("buy", "view"), List("view"), rank, numIterations, lambda, Option(3)))) )
}

Now let us rebuild the engine.

root@eval1:/MyEngine/ecorec0# pio build --verbose

At this point we can start evaluation code. Notice that I’m using “com.kolibero” package name in following command. If you have entered a different package name during engine registration then you have to use your own.

root@eval1:/MyEngine/ecorec0# pio eval com.kolibero.RecommendationEvaluation com.kolibero.EngineParamsList

You should see a result similar to the following:

[INFO] [Jaccard] user: u2, jc: 0.09090909090909091, ars: 7, prs: 5
[INFO] [Jaccard] user: u2, jc: 0.3333333333333333, ars: 7, prs: 5
[INFO] [Jaccard] user: u4, jc: 0.09090909090909091, ars: 7, prs: 5
[INFO] [Jaccard] user: u4, jc: 0.3, ars: 8, prs: 5
[INFO] [Jaccard] user: u5, jc: 0.09090909090909091, ars: 7, prs: 5
[INFO] [Jaccard] user: u9, jc: 0.0, ars: 7, prs: 5
[INFO] [Jaccard] user: u10, jc: 0.08333333333333333, ars: 8, prs: 5
[INFO] [Jaccard] user: u3, jc: 0.0, ars: 7, prs: 5
[INFO] [Jaccard] user: u5, jc: 0.3, ars: 8, prs: 5
[INFO] [Jaccard] user: u9, jc: 0.2222222222222222, ars: 6, prs: 5
[INFO] [Jaccard] user: u7, jc: 0.1, ars: 6, prs: 5
[INFO] [Jaccard] user: u10, jc: 0.4, ars: 9, prs: 5
[INFO] [Jaccard] user: u1, jc: 0.0, ars: 7, prs: 5
[INFO] [Jaccard] user: u3, jc: 0.5, ars: 7, prs: 5
[INFO] [Jaccard] user: u6, jc: 0.0, ars: 7, prs: 5
[INFO] [Jaccard] user: u7, jc: 0.2, ars: 7, prs: 5
[INFO] [Jaccard] user: u8, jc: 0.08333333333333333, ars: 8, prs: 5
[INFO] [Jaccard] user: u1, jc: 0.2, ars: 7, prs: 5
[INFO] [Jaccard] user: u6, jc: 0.1, ars: 6, prs: 5
[INFO] [Jaccard] user: u8, jc: 0.3, ars: 8, prs: 5
[INFO] [MetricEvaluator] Iteration 0
[INFO] [MetricEvaluator] EngineParams: {"dataSourceParams":{"":{"appName":"ecorec0","evalParams":{"kFold":2,"queryNum":5,"buyTestScore":10.0,"viewTestScore":1.0}}},"preparatorParams":{"":{}},"algorithmParamsList":[{"ecomm":{"appName":"ecorec0","unseenOnly":false,"seenEvents":["buy","view"],"similarEvents":["view"],"rank":10,"numIterations":20,"lambda":0.01,"seed":3}}],"servingParams":{"":{}}}
[INFO] [MetricEvaluator] Result: MetricScores(0.16974747474747473,List(7.2, 0.22430555555555554))
[INFO] [CoreWorkflow$] Updating evaluation instance with result: MetricEvaluatorResult:
# engine params evaluated: 1
Optimal Engine Params:
{
"dataSourceParams":{
    "":{
      "appName":"ecorec0",
      "evalParams":{
        "kFold":2,
        "queryNum":5,
        "buyTestScore":10.0,
        "viewTestScore":1.0
      }
    }
},
"preparatorParams":{
    "":{

    }
},
"algorithmParamsList":[{
    "ecomm":{
      "appName":"ecorec0",
      "unseenOnly":false,
      "seenEvents":["buy","view"],
      "similarEvents":["view"],
      "rank":10,
      "numIterations":20,
      "lambda":0.01,
      "seed":3
    }
}],
"servingParams":{
    "":{

    }
}
}
Metrics:
Jaccard (scoreThreshold=1.0): 0.16974747474747473
PositiveCount (threshold=1.0): 7.2
Precision@K (k=10, threshold=1.0): 0.22430555555555554
[INFO] [CoreWorkflow$] runEvaluation completed

At the end of the result you can see Jaccard metric together with positive count and Precision at K metric. These results will vary with your training data.

We have used for the evaluation RecommendationEvaluation and EngineParamsList. Both objects are declared inside Evaluation.scala file.

EngineParamsList

Declared in Evaluation.scala.

Contents:

trait BaseEngineParamsList extends EngineParamsGenerator {
protected val baseEP = EngineParams(
    dataSourceParams = DataSourceParams(
      appName = "ecorec0",
      evalParams = Some(DataSourceEvalParams(kFold = 2, queryNum = 5, buyTestScore = 10.0, viewTestScore = 1.0))))
}

object EngineParamsList extends BaseEngineParamsList {
engineParamsList = for(
    rank <- Seq(10);
    numIterations <- Seq(20);
    lambda <- Seq(0.01))
    yield baseEP.copy(
      algorithmParamsList = Seq(
        ("ecomm", ECommAlgorithmParams("ecorec0", false, List("buy", "view"), List("view"), rank, numIterations, lambda, Option(3)))) )
}

EngineParamList is inheriting on BaseEngineParamList. BaseEngineParamList is declaring basic parameters used by producing test data for evaluation. Test data are produced in readEval method declared in DataSource.scala class DataSource. Here we have following parameters:

kFold = 2,
queryNum = 5,
buyTestScore = 10.0,
viewTestScore = 1.0

kFold - training data taken from event strore will be folded and split into test and training data using modulo opertion with fold number and unique index number.
queryNum - evaluation queries will return 5 items
buyTestScore - score assigned to test data delivered to evaluation algorithm (buy events). See class DataSource.readEval.
viewTestScore - score assigned to test data delivered to evaluation algorithm (view events). See class DataSource.readEval.

EngineParamList declares evaluation loop. In this case w have a loop with only one engine params set. As we see in implementation of EngineParamList evaluation will be executed for following params set:

rank = 10
numIterations = 20
lambda = 0.01
unseenOnly = false
seenEvents = buy, view
similarEvents = view
seed = 3

We can implement a loop traversing various parameter sets by extending sequences in for loop declaration.
RecommendationEvaluation
Declared in Evaluation.scala.

object RecommendationEvaluation extends Evaluation {
engineEvaluator = (
    ECommerceRecommendationEngine(),
    MetricEvaluator(
      metric = Jaccard(scoreThreshold = 1.0),
      otherMetrics = Seq(
        PositiveCount(scoreThreshold = 1.0),
        PrecisionAtK(k =10, scoreThreshold = 1.0)
      )
    )
)
}

RecommendationEvaluation object is implementing what algorithms will be used for evaluation purpose. We can see here that main metric will use Jaccard class with parameter scoreThreshold = 1.0. Parameter with value 1.0 means that we will take into account all data delivered for testing purposes. We are assigning scores to test data inside BaseEngineParamsList in declaration of evalParams. We have there:

evalParams = Some(DataSourceEvalParams(kFold = 2, queryNum = 5, buyTestScore = 10.0, viewTestScore = 1.0))))

This means that our data source of evaluation training data will produce data with kFold=2 (see DataSource.scala readEval method). We will use evaluation queries asking for 5 results. Buy event related item score will be 10.0. View event related item core will be 1.0.

We have secondary metric algorithms defined in RecommendationEvaluation. I have used PrecisionAtK and PositiveCount metric. Both metric also take parameters. See implementation of PrecisionAtK and PositiveCount. K cuts the number of predicted results and positiveCount is filtering events.

Evaluation Code

DataSource.scala

class DataSource, method readEval

Method preparing data training and test data for evaluation.

Training/test data preparation steps:

Get all users - usersRDD variable. Data from $set events.
Get all items - itemsRDD variable. Data from $set events.
Get all events - eventsRDD variable. Data from “buy” and “view” events.
Loop through 0 until kFold-1

Split data into training and test events. Use modulo to split data.
Prepare training view events - trainingViewEventsRDD variable
Prepare training buy events - trainingBuyEventsRDD variable
Prepare test view events - testViewEventsRDD variable
Prepare test buy events - testBuyEventsRDD variable
Create a list of test users. They will be used to ask queries by evaluation loop (testingUsers variable)
Create test item scores consiting of user id, item id and score (testItemScores variable)
Build resutl tuples containing training data, empty evaluation info, test queries+actual results list. See readEval declaration ( Seq[(TrainingData, EmptyEvaluationInfo, RDD[(Query, ActualResult)])] ).

Evaluation.scala

case class PrecisionAtK

Implements precisionAtK metric evaluating results of queries.

case class Jaccard

Implements Jaccard metric evaluating results of queries.

def jaccardValue (A: Set[String], B: Set[String]) : Double = {
return A.intersect(B).size.toDouble / A.union(B).size.toDouble
}

case class PositiveCount

Implements PositiveCount metric evaluating results of queries.

object RecommendationEvaluation

Defines a list of evaluation metrics which will be used by evaluation.

trait BaseEngineParamsList

Defines a list of basic parameters used by preparation of test data for evaluation.

object EngineParamsList

Defines an evaluation loop. The loop creates engine parameter sets.

Sunday, 14 February 2016

Text Classification as a Service

Let us imagine we have a service collecting unstructured textual data from our partners. We are collecting that data and building service directories out of it.
How to keep our data clean and tidy without investing lots of money in expensive MDM platforms? We can use text classification services.

Text classification services using machine learning technologies keep track of incoming data and help categorize it in fully automatic way. They are using advances text matching algorithms to correlate and clean data.

Text Classification Engine

TC Services Availability

You can have your our own text classification service on demand. Service will be delivered via PubNub queue. It can be started up in minutes and serve your needs just as long as you wish.

Services are built on top of PredictionIO technology and are using PubNub queues as a transport medium. Core components of the engines in most cases are open source. They are in form of templates developed by growing community of PredictionIO developers.

Text Classification Engines are running as Docker containers. This technology allows to create new instances of engines just in minutes in any environment running Docker service. I means they can run in AWS cloud, locally in your back-end servers or even on you laptop running Linux VM.

What do you need to have your own Text Classification Service?

You need your own PubNub queue. PubNub queues are free below 1 million of messages sent. See my previous blog Recommendation Engine in Docker Container. You can find there instructions how to setup a PubNub queue.
Use goliasz/tcaas-micro Docker image to spin up you own container or ask for help (KOLIBERO).

Conclusions

Text classification services are somewhere out there in the cloud. But they can be yours with very little effort.
You don't need your hardware to get text classified. You can just order an classification engine for you and use it using PubNub queues.
Such distributed services scale together with business growth. Cloud does not have borders and limits and you can have as many engines as you can imagine.

Resources

goliasz/tcaas Docker Image
TCaaS Description
TCaaS Demo
goliasz/pio-template-text-similarity PredictionIO Template
PubNub
PredictionIO

Tuesday, 2 February 2016

Recommendation Engine in Docker Container!

Check out my docker container with recommendation engine serving recommendations via PubNub queues!

You can find introduction to the idea of Subscribe-Serve and Subscribe-Get Service in my previous post Recommendation as a Microservice.

Show Time

How to run your own recommendation micro-service? It's very easy. You can do it in just few simple steps.

Steps Summary

1. Pull docker image.
2. Start docker container.
3. Train.
4. Get service.

Detailed Instructions

1. Pull docker image.

docker pull goliasz/raas-micro:1.1

2. Create your account and first queue in PubNub.

Go to PubNub home.
Register. The simples way is just by using Google account.
Create you PubNub App.

PubNub application with publish and subscribe keys assigned

Once you have your PubNub application you have your Publish Key and Subscribe Key assigned.

Start Debug Console

Debug console before adding clients

Choose your channel name
Add two clients

Two queue clients added. First maximized.

3. Start you docker container using your Subscribe Key, Publish Key and Channel ID.

docker run -dt --hostname reco1 --name reco1 -e "PN_PUBKEY=pub-c-1113-demo-3" -e "PN_SUBKEY=sub-c-1f1a-demo" -e "PN_CHANNEL=Channel-mydemo-154" goliasz/raas-micro:1.1 /MyEngine/autostart.sh

Wait two minutes and you should see in your PubNub queue readiness messages.

Readiness messages

You should see three messages.

{
  "msg": "training ready",
  "rtype": "info"
}

{
  "msg": "query ready",
  "rtype": "info"
}

{
  "msg": "service ready",
  "rtype": "info"
}

4. Train your recommender engine with some data.

Copy/Paste one by one training messages below to PubBub client window and "Send" after each message.

{

"event": "purchase",

"entityType": "user",

"entityId": "u1",

"targetEntityType": "item",

"targetEntityId": "Iphone 6",

"rtype": "train"

}

Click "Send"

{

"event": "view",

"entityType": "user",

"entityId": "U 2",

"targetEntityType": "item",

"targetEntityId": "Phones",

"rtype": "train"

}

Click "Send"

{

"event": "$set",

"entityType": "item",

"entityId": "Galaxy",

"properties": {

"categories": [

"Phones",

"Electronics",

"Samsung"

]

"rtype": "train"

}

Click "Send"

After sending first message you should see the message repeated in second client window.

Training message repeated in second client window.

After sending all training messages

Now you have to instruct the engine to train its recommendation model. Send service message below.

{

"cmd": "retrain",

"rtype": "service"

}

Service "Retrain" message sent

Wait 3 or 4 minutes and get your recommendation.

5. Ask for recommendations

Just send query message.

{

"user": "u1",

"item": "Iphone 5",

"num": 5,

"rtype": "query"

}

Query message and response with recommendation

You should receive message with recommendation.

Example:

{
  "itemScores": [
    {
      "item": "Galaxy",
      "score": 1.3233743906021118
    }
  ],
  "rtype": "response"
}

Congratulations! You have your own recommender engine in Subscribe-Server architecture!

Conclusions

Nothing stops you to have your own recommender engine.
It is easy!

Do you have any problems? Just call me. Contact details here http://kolibero.eu/contact

Resources

Saturday, 30 January 2016

Recommendation as a Microservice

Successful e-business is using recommendations and this has become a standard. Good recommendations can increase conversion rate by significant percentage. In such cases recommendation engines are worth money spent. But they don't need to be expensive.

Low cost recommendations for everybody

Development state of machine learning platforms is significant and situation is mature enough to make recommendation just as a simple and cheap service.

How about exposing recommendation engines via secure queues available to every business and organization. Such queues are already available. Great example of such service are PubNub queues. This is great simple and secure technology designed for new Internet of Things era.

Subscribe - Get Service

How about just subscribing to a secure queue and receiving recommendation in 200 ms? Secure PubNub queues are accessible right now. You can have your own queue in just couple of clicks. What if on the other side of the queue is micro-service delivering recommendations on demand. Your personal instance of recommendation engine. Technology is available right now.

See my recommendation engine demo serving recommendations via queue.

Subscribe - Train

To have recommendations you need to train your engine with data related to your business. How to do it? Again secure queue is a solution. Just subscribe to your queue and publish fully anonymous data (just ids) to your personal instance of recommendation engine.

Training of Recommendation Engine.

Subscribe - Serve

Trained machine learning engine is just subscribing back to your queue and serving personalized recommendations tailored only to your needs.

Subscribe - Serve Recommendations

The engine can be deployed anywhere. In-house or in the cloud like Amazon's AWS. It can be just rented for some time or just for a finite number of recommendations. Deployment models are fully flexible. Just imagine your solution!

Endless Possibilities

And now! What if we need another service? If we want to answer another question? Some examples. How long is my customer going to stay with me? When is the customer going to buy the product again? What to do to make him/her stay?

Conclusions

Good recommendation is not a luxury. It is available at hand. Just great service at low cost.
You don't need data scientists or big IT departments to have it.
You can integrate it with your platforms in super easy and secure way.