SnapLogic Data Science brings self-service to machine learning

4 min read

Today, we announced the launch of SnapLogic Data Science, a visual self-service solution for the entire machine learning (ML) lifecycle. SnapLogic Data Science, together with SnapLogic’s award-winning integration platform, the Enterprise Integration Cloud, supports data sourcing, data preparation and feature engineering, and the training, validation, and deployment of machine learning models all in one platform. It’s the first of its kind.

Why we built our data science solution

Machine learning projects usually involve several people – data engineers, data scientists, DevOps, and others. Conventionally, at each stage of the ML lifecycle, most, if not all, of these people end up doing an excess of custom coding, redundant work, and manual trial and error.

In the absence of automation, a data engineer might spend days trudging through integration activities like gathering, cleaning, and transforming data, all using code. Or worse, the data scientist will get stuck doing this work. At the tail end of the ML lifecycle, a software development team has to translate the final model to a different programming language to put it into production. All of these challenges jeopardize the success of your ML project.

At SnapLogic, we know many of these challenges all too well. We experienced them first-hand when we built the Iris AI Integration Assistant, an ML-driven recommendation engine that offers contextual Snap suggestions when building integrations. We left that experience with one big takeaway: machine learning shouldn’t involve so much code-heavy, redundant work; in fact, it needs the kind of self-service capabilities for which SnapLogic is well known. This is why we built SnapLogic Data Science.

What are the benefits of SnapLogic Data Science?

SnapLogic Data Science brings our proven self-service productivity to the entire machine learning lifecycle, decreasing the time-to-value of your ML initiatives. It simplifies data cleansing tasks, boosts your productivity during the model development process, and enables you to deploy your model as soon as it’s ready.

SnapLogic Data Science accelerates and simplifies the four key stages of the machine learning lifecycle:

  • Data acquisition
  • Data exploration and preparation
  • Model training and validation
  • Model deployment

1) Data acquisition

SnapLogic Data Science makes it easy to retrieve raw data for your training datasets. Instead of writing code or asking IT for one-time data dumps, data engineers can consume all kinds of data through simple dragging and dropping. SnapLogic Data Science lets you easily integrate a variety of endpoints – databases, both relational and NoSQL, cloud applications, data lakes, JSON files, etc. – when developing an ML model.

2) Data exploration and preparation

SnapLogic Data Science enables data engineers and data scientists to easily filter sensitive information, transform data, map fields, and perform other data preparation tasks in a productive low-code environment. What’s more, SnapLogic Data Science offers new Snaps and pre-built data pipelines for operations specific to machine learning. For example, the Categorical-to-Numeric Snap in the ML Data Preparation Snap Pack lets you convert categorical data (e.g., small, medium, large) to numeric data using integer or one-hot encoding with just a few clicks.

3) Model training and validation

SnapLogic Data Science not only speeds up the creation of the training datasets. It accelerates the training and validation of models. This capability enables data scientists to configure models using Snaps, further cutting down on scripting. For example, the Snaps for regression models (e.g, the Predictor – Regression Snap) implement several state-of-the-art algorithms (e.g., linear regression) based on mature open source libraries that data scientists can use when building a model.

But if you’d rather build a model from scratch, have the flexibility of Python, and use Jupyter Notebooks, SnapLogic Data Science supports that too. Simply write the native Python in Jupyter and publish them directly into the SnapLogic pipeline to operationalize them.

SnapLogic Data Science also enables you to validate your model with cross-validation Snaps – a quick and easy process.

4) Model deployment (production)

Deploying a model often is a slow, cumbersome process. In many cases, to get a model to work in the real world, developers have to translate the production model to another programming language. SnapLogic Data Science removes the translation phase and lets you deploy your model as an API as soon as it’s done. Not only this, SnapLogic Data Science automates many of the steps involved in continuously training your model, thus ensuring your model continuously improves its prediction accuracy over time.

Conclusion

SnapLogic Data Science makes end-to-end machine learning accessible to enterprises of all sizes for the first time. It enables you to build, train, validate, and deploy high-performing models faster than ever before. Now, organizations can pursue machine learning initiatives with confidence, knowing that, with SnapLogic Data Science, their odds of seeing a big return on their AI investments have greatly increased.

  • Learn more about all the benefits and features of SnapLogic Data Science.
  • Download the SnapLogic Data Science data sheet.
  • Get a quick overview of SnapLogic Data Science in this short demo video.
  • Watch our latest ML-focused webinar: “Data scientist shortage? No problem. Self-service machine learning made possible by SnapLogic”
Former Chief Data Officer at SnapLogic

We're hiring!

Discover your next great career opportunity.