1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Navigating the jungle of choices for scalable ML deployment

Discussion in 'Computer Science' started by Tfovid, Oct 8, 2018.

  1. Tfovid

    Tfovid Guest

    I have prototyped a machine learning (ML) model on my local machine and would like to scale it to both train and serve on much larger datasets than could be feasible on a single machine. (The model was built with Python and Keras. It takes in a CSV table of inputs and spits out the corresponding CSV table of predicted outputs.)

    My naive "vision" is that I'd have the model reside on a single (master) machine, whereas the data would be equally distributed among several units (whatever unit means: nodes in a "cluster", CPUs, GPUs, ... ?) The model would be projected onto these units and the learned parameters would somehow all synchronize back to the master unit. Similarly, in the case of serving, the same model would be applied to the data that resides on the different units. Does this "vision" sound reasonable? (I have had some experience with parallel computing with MPI and I vaguely remember that's how things work.)

    If I were to start from a blank slate, what architecture/infrastructure should I choose to deploy my model in a scalable fashion? Below are some of the confusingly many options I have read about. (I hardly master what each of these things do, so please forgive me if it looks like a laundry list of disparate technologies.)

    As a pure ML guy (read: Python, Keras, pandas guy coding on a laptop), I'm out of my depth with all the infrastructure jargon that comes with the above links. It's therefore overwhelming to find a starting point, or some kind of "Hello World" example I could relate to. All I want is an architecture from which I can transpose my code and have it run in an efficient and scalable manner. Which one does the job? Despite all the hype around ML, there does not seem to be any comprehensive map or comparative review of all these solutions. I did find some comparisons, for example between Spark and SageMaker, or Spark and Dask, but given my illiteracy in these subjects, they only add to the confusion.

    Login To add answer/comment

Share This Page