The Practice of Implementing ML Service into an Internet Business Application

– Currently, there are a large number of articles describing the theoretical aspects of development in the field of machine learning. However, the experience of their practical application in real systems is described much less often. Basically, authors describe the efficiency, accuracy, and other performance metrics of the resulting solution, but everything stops at the prototype stage. At the same time, how the trained model will behave not on test data, but in real conditions, can be very different from the indicators obtained at the development stage. This article describes the experience of the implementation and real use of a classification service based on machine learning techniques.


I. INTRODUCTION
Scientific articles on the development of new, or improvement of already existing methods of machine learning are constantly published in journals and collections of articles released as a result of conferences.However, against the backdrop of a large amount of academic research, the number of articles describing the practice of using machine learning to solve real-world problems is not so large.
This article describes the experience of using the classification model in a real working business system.The task was to create an internal service that would return a set of relevant categories based on some text description.At the beginning of the project, there was a database with a large number of texts and related to each category.As part of the task, the initial import and preparation of data for training were made, the classification model was trained, the effectiveness of several types of classification models was tested, a microservice was created and its load testing was performed.

II. GENERAL ARCHITECTURE
Python was chosen as the main programming language for the system being created.Currently, it is one of the most popular tools for creating systems in the field of machine learning [1].The language has a large number of libraries, both directly intended for solving problems of training classification models, and for solving related problems, such as importing data, cleaning it, lemmitization [1].The main libraries used in the project: • csv -for generating files in CSV format; • zipfile -for unpacking and packing archived files; • shutil -for recursive work with the file system; • pymysql -for working with a MySQL database; • pickle -for serializing binary models to files; • pandas -for convenient presentation of data from a CSV file in memory; • sklearn.*-for importing various classification models [2].
An interesting feature of the language is the creation of virtual environments (virtualenv), which allows you to isolate the software libraries used in the project from their system-wide versions of the underlying operating system -the host.
Most modern projects use version control systems both during development and for distribution.Systems such as GIT or Mercurial [3] allow you to conveniently organize the project development process among a large number of developers, provide decentralized storage of all source codes, and maintain an advanced revision history.The created system also uses a closed GIT repository to store all project files.
Modern versions of the command shell (console), in addition to just text input / output, provide various options for outputting information.For example, for operations that take a long time to complete, it is convenient to use an interactive indicator of the percentage of operations performed.It is also convenient to provide the ability to customize script parameters based on user input.For example, the following simple Python code allows the user to confirm or reject a question from the script: Then the following code will display a request to delete the directory with old data, write it to the delete_old_data variable (Fig. 1): delete_old_data = user_yes_no_query('Delete old generated data folder?') To train a high-quality model, there must be a lot of consistent information in the database.In this case, about 600 000 records were used, with several categories for each of them.The relationship between texts and categories in the database is shown in Fig. 2.
Initially, texts and categories are stored in different database tables, but during the import process they are combined into one common CSV file.The final process of obtaining a classification model from records in the database is shown in Fig. 3.
The process consists of the following steps: 1. Getting records from the database formatting and saving them as a CSV file.2. Training the model on data from a CSV file, and serializing the resulting model into a PICKE file.
In the process of formatting the text, it is possible to clear it of unnecessary characters (numbers, punctuation marks, etc.), as well as to bring it to normal form (lemmatization). Depending on the characteristics of the texts used for training, a model trained on a normalized text can show both better and worse results relative to a model trained on a nonnormalized text.

III. TRAINING AND MODEL SELECTION
Usually, when training a model, the base dataset is divided into two parts: training and test, then the first is used for training, and the second is used to assess the quality of the built model.In manual mode, splitting a test sample into parts is not always convenient, so the sklearn package has a special method for this: train_test_split(), which automatically splits the sample into the required parts.An example of using this method is given below: When creating a machine learning model, the most important issue is to evaluate the effectiveness of the trained classifier.One of the simplest and effective ways to obtain this estimate is to compare the classification accuracy of the test sample created by the model with the real categories marked in the test sample.For this purpose, it is possible to use the algorithm shown in Fig. 4, or the same algorithm in the format of code: Here, the clf variable contains the object of the classifier model under test.------------------------------------------------------------------------------------------------2021/24 10 As a result, after executing this code, the console will display the percentage of correctly classified records, which can be considered a measure of the accuracy of the resulting classifier.

Information Technology and Management Science
Since the various models from the scikit-learn package inherit a common interface, they all have a model.predict()method, which allows you to evaluate the quality of different models using the same code.In the future, the presence of a common interface for working with different models can provide interesting opportunities for automatic selection of the best model.To do this, you only need to prepare a list of all models that are suitable for the current task + an array of possible parameters for each.Additionally, for each of the possible parameters, you need to indicate its type (nominal, discrete, ordinal, numerals, etc.).For different data types, you must also specify either a set of possible values or a range + the step size used.As a result, the program will automatically apply all algorithms to the training data, and test the effectiveness of the resulting model on a test sample.The efficiency of the parameters can be selected using genetic algorithms, then initially several random sets of possible parameter values are formed, and then this population evolves, thereby improving the quality of the used set of parameters.As a result, only the best algorithm with the best found set of parameter values will be used for classification.Such a method can require significant time and resources, but the final model can show good performance indicators, and against this background, the cost of finding it may not be significant.
The process of training the model itself is relatively simple, and consists of the following stages:

Naive Bayes
Naïve Bayes classifiers [4] are often used for text classification because of their speed and good accuracy in some of cases.Classification tree [5] It is a powerful and popular text classification method.It shows especially good results in the case of the consistency of the training samples set.Simple decision tree structure is presented in Fig. 5. from sklearn.tree import DecisionTreeClassifier clf = DecisionTreeClassifier(random_state=0).fit(train_x,train_y)
from sklearn import linear_model clf = linear_model.SGDClassifier(max_iter=1000, tol=1e-3, loss='log').fit(train_x,train_y) SVM classifier variant Support Vector Machines [10] (Fig. 9) -another popular approach for text classification with Machine Learning.One-vs-the-rest (OvR) multiclass strategy OvR [11] is a popular heuristic method for using binary classification algorithms for the multi-class classification purposes.Common approach for OvR classification is presented in Fig. 10.Depending on the type of model and training settings, the time required to build the final classifier can vary greatly.With 600 000 records, some of the models take 15 minutes to train and some take hours.At the same time, it is impossible to assess the quality of the model without spending time on its training.Therefore, testing the influence of various parameters on the final quality of the classification can be quite time-consuming.
The size of the resulting PICKLE file with the model is not large for most models.As part of the work carried out, for various classifiers it varied from 0.5 to 2.5 Mb.

IV. CREATING MICRO-SERVICE
After the most effective model has been obtained, the stage of creating a micro-service begins, which will allow using it within the framework of a real service.For this purpose, an approach based on a combination of Supervisor [12] + Nginx [13] = JSON API [14] technologies is used.
The structure of the created solution is shown in Fig. 11.A pool of processes managed by Supervisor is created in the system.The result of the script is passed back to Nginx, which then sends it to the requesting device.As a result, the created service becomes available for interaction from the internal network or the Internet.Depending on the expected load, the number of processes can be significantly increased.The speed of the server is also important.Currently, a pool of 32 processes provides classification on a public service with a response rate of 250 ms. per request.Until the number of simultaneous requests to the server is exceeded, the processing time for each request will be constant.When queues appear, the service response time will noticeably increase.Therefore, it is important to initially correctly estimate the maximum planned load and set the optimal number of processes in the pool.
Directly the response time of the service can be obtained using the standard Linux utility CURL: curl -o /dev/null -s -w 'Total: %{time_total}s\n' https://ml.services.company.com/model/?text=lorem%20ipsum For more detailed testing of a service under load, the ApacheBench utility [15] is often used, which allows evaluating the response time of a service under conditions of a different number of parallel requests.The utility accepts three main parameters as input: -n: the number of requests to send; -t: a duration in seconds after which ab will stop sending requests; -c: the number of concurrent requests to make.
Then an example of a request for testing a service for response speed with 100 requests in parallel 10 requests simultaneously will be implemented as follows: ab -n 100 -c 10 https://ml.services.company.com/model/?text=lorem%20ipsum At the end of the work, the utility displays a summary table with the results (Table I), which contains the minimum, average and longest time it took to receive a response from the service.If the service is accessible on the Internet, it is important to ensure its security.This topic is no longer related to the topic of this article, and for services working in the Supervisor + Nginx bundle, there is a detailed description of how to ensure resistance to hacking.Let us just say that it is important to pay attention to such concepts as requests throttling and failtoban.
Researchers often perform an automatic analysis of vulnerabilities using pentest services [16].This type of service makes it possible to assess the presence of typical vulnerabilities in various services.When creating a publicly available service, the programmer is not always able to provide protection against all known and unknown vulnerabilities at the time of writing the code.Some of the vulnerabilities are closed by updating both the operating system itself and the libraries used in its creation, but they cannot protect against human errors.Therefore, there are both free and paid professional tools for automatic and semi-automatic testing of services for a wide variety of vulnerabilities.Usually, when using them, the checking service generates a large number of requests to the service using various POST, GET, PUT parameters, various header keys are used, as well as direct requests to system configuration files (also log files), which may contain important data, not intended for public access.In the event of any atypical response from the service under test, the verification system signals a potentially found vulnerability.------------------------------------------------------------------------------------------------2021/24 14 V. CONCLUSION AND FUTURE WORK In conclusion, we can say that the system described in the article does not include a large number of implementation details, but at a general level it allows you to describe the structure of an ML-based application actually working in business, and this is the aim of this article.
Despite the external simplicity, the described system has been successfully working for several years.Of the problems that have arisen during this time, we can only name the need to increase the number of processes in the Supervisor pool, when, due to the activation of an advertising campaign, an unexpectedly large number of requests began to enter the system.Also, the system does not have an automated tool for retraining the model, and when updating it, you need to manually fill in the new model and restart both Supervisor and Nginx.However, the model is updated very rarely, and there is simply no need to automatically update the model.
The overall effectiveness of the created solution is assessed by the company's management, based on the financial benefits received from its implementation.Since the new service reduced the amount of manual work and increased the profit indicator, its implementation was recognized as completely successful.
In the future, it is planned to expand both the number of various automatic classifiers and improve their quality in order to further reduce manual work.It is planned to introduce automatic methods of retraining, quality testing and updating models when the system will be constantly improved.

Fig. 1 .
Fig. 1.Request when executing a script in the console.

1 .
The required library is imported.2. Parameters of training of the selected model are set.3. The training code is called, into which the training and test samples are transferred.Examples of training different models are given below.