banner

News

Jun 24, 2025

Production Data Set for five-Axis CNC Milling with multiple Changeovers | Scientific Data

Scientific Data volume 12, Article number: 1067 (2025) Cite this article

Metrics details

This data descriptor contains information about an extensive production data set for a five-axis CNC milling process. Three geometrically different products were manufactured and relevant features from the numerical control of the machine were recorded. The recorded manufacturing process contains the preparation of the machine for the next product (changeover) as well as the machining process (production). The experimental manufacturing was organized with the aid of a changeover matrix to ensure that all possible changeover combinations for the three products were considered. The production was repeated five times, resulting in 30 manufacturing sessions and five complete changeover matrices. The data set was recorded in a laboratory environment. A rich feature set including i.e. the NC-code of the products, tool information, and a Jupyter notebook is provided with the data set.

In times of the fourth industrial revolution (Industry 4.0), complete and transparent data are proving to be decisive factors in terms of increasing efficiency in production processes for industrial manufacturing companies1. Manufacturing-related data, especially for specific processes such as milling, is scarce compared to the large number of other data sets in repositories2. For this reason, this data descriptor and the corresponding data set3 also aim to make more production data available for research purposes.

The data was recorded in March 2024 during three days in the machine tool laboratory of the Technical University of Applied Sciences Würzburg-Schweinfurt. The data set3 originates from an experimental production of a five-axis milling machine tool. An industrial production using a machine tool not only consists of the machining process in which the part is geometrically created. In the context of production and manufacturing technology, the term ‘changeover’ describes all the activities necessary to set up a machine or production system for a subsequent production order and return it to its original state4. The changeover process includes all preparatory and follow-up activities, including dismantling previous tools and devices, installation and setup of new tools, performance of test runs, fine adjustments, and cleaning activities (Fig. 1). The changeover is usually conducted by an operator interacting with the manufacturing machines. During this interaction, several specific time components can be identified, such as the changeover basic time or changeover rest time5 (S. 1642). The recorded data set3 covers the entire production process, which includes changeover and machining.

Visualization of changeover and production according to McIntosh et al.19.

For experimental production, three products were chosen to record changeover and production activities. The three products are shown in Fig. 2: a keychain (Product A), a bottle opener (Product B), and a coordinate system representation (Product C). They all differ in the tools used, the changeover time, and the production time. The underlying NC-code for the machining process as well as specifications for tools are provided as external material6 to the data set3. The experimental production was carried out in a laboratory environment of the university by laboratory engineers.

Chosen products for the data set3: keychain (Product A), a bottle opener (Product B), and a coordinate system representation (Product C).

During previous research, a first data set7 was published2. The focus of this data set7 was to publish data from manufacturing under real-world conditions in a running industrial two-shift production. During thirteen manufacturing sessions, manufacturing data from 13 different products was recorded. Manufacturing was related to existing customer orders that were not repeated. Production was carried out by trained operators from company personnel. The number of features was limited to 20 by a feature selection approach, as well as the number of labels for production phases was limited to three to facilitate the usability of the data set7.

To overcome the limitations mentioned, a new data set3 was prepared, described in this article. Table 1 compares the previously published data set7 (left) with the new data set3 (right). In the new data set3, all three products were manufactured six times, including machining and changeover sequences. This results in 30 manufacturing sessions exceeding the 13 sessions of the former data set7. From the available feature set, only features that showed no information content were excluded from the feature set to allow researchers to design their feature selection approach from the 170 available features of the data set3. While in the former data set7 the labels for the different phases in the changeover and production process were divided into 2, 6, and 23 phases, in this data set3 the process offers many different label sets with up to 73 possible defined sub-phases. The label sets and corresponding sub-phases were derived from a detailed description of the changeover process, available as external material on GitHub6. In addition to the changeover process, information about the machining process is also available through the NC-code provided for the three products.

The presented data set3 can be used to model technical relationships of the production process between the recorded 170 features. As the changeover phase of production is rich of activities from the human operator in relation with the manufacturing machine, the data set can be used to model human machine interactions. Here, the various label sets for different production phases can be beneficial, as well as detailed information on the changeover procedure itself. The Section ‘Application examples’ portrays two corresponding examples from the research work of the authors.

The concept of a changeover matrix is explained in the next subsection. In the subsequent sections, the data collection process, data pre-processing, and labeling are described.

To understand the structure of the data set3, it is important to understand the concept of a changeover matrix. A changeover matrix represents all the changeover processes of a machine and their corresponding mean actual changeover times in a transparent form. The changeover matrix shown in Table 2 is an example of the changeover times between three different products of a fictitious machine, with the times given in minutes. The changeover matrix maps all possible setup sequences from one product to the next. The values on the main diagonal are normally zero as no setup process is required for the same product. For better clarity, these entries are marked with an X. The other values in the matrix represent the actual changeover times required to change the machine from one product to another. For example, changing from product A to product B takes 200 minutes, while changing from product B to product C takes 175 minutes. This form of representation makes it possible to quickly identify changeover operations with short or long changeover times, and thus make targeted decisions on which changeover operations should be prioritized or avoided in production planning. In Enterprise Resource Planning software (ERP) SAP, the setup matrix is used to precisely model sequence-dependent setup times and costs, allowing the optimal sequence of production operations to be determined and production operations to be designed more efficiently8.

The underlying experimental production of this data set3 is organized according to the changeover matrix as shown in Table 2. Three products are produced. However, production was organized so that all six possible changeover combinations between the three products were recorded and a complete changeover matrix was established. This sequence was repeated five times for statistical reasons and resulted in 30 manufacturing sessions with five complete changeover matrices.

The used five-axis milling machine tool ‘Spinner U5-620’ was built in 2016 and is equipped with a Siemens 840D-SL control. This machine is designed for producing precision parts in various industrial sectors such as aerospace, automotive, and toolmaking. The machine has five axes, including three linear axes, and two rotary axes. The travel paths of the linear axes and the swivel ranges of the rotary axes of the rotary/tilting table are described in detail as follows9 (p. 7-96):

X-axis (linear axis): Max. movement 620 mm from -365 mm to + 255 mm

Y-axis (linear axis): Max. movement 520 mm from -296 mm to + 224 mm

Z-axis (linear axis): Max. movement 460 mm from 150 mm above table to 610 mm above table

B-axis (rotation axis): Swivel range 200∘ from -90∘ to + 110∘

C-axis (rotation axis): Enables full 360∘ rotation

The data from the machine’s numerical control (NC) is acquired by a ‘uaGate 840D’ interface from Softing which was integrated into the machine’s electrical cabinet. The ‘uaGate 840D’ collects data from both the numerical control kernel (NCK) and the programmable logic controller (PLC) using the SIMATIC S7 protocol. A subset of the available features was selected to be transmitted to a database. The selection was based on the features of the previous research, which was expanded to include tool magazine, axis, and drive information. The data was sent in an interval of 1s only if the value changed.

As shown in Fig. 3, the gateway acts as an MQTT publisher that forwards the machine data received from the controller to the Mosquitto MQTT broker. The Mosquitto broker receives the messages sent by the gateway and plays a central role in the communication setup by making the received data available to the subscribers. In the present use case, the data is collected and processed by the Telegraf Agent application, which acts as an MQTT subscriber, and then stored in an InfluxDB database.

Communication architecture for data acquisition.

The recorded data was exported from the database as a CSV file. Data pre-processing begins by filling the first row of all files so that the information of all features is complete. Some features might have missing values, for instance, when no tool is currently in the spindle or when no NC-code line is being executed. These NaN values were retained while missing entries were filled with the last valid value for other features. Subsequently, the data was labeled using the timestamps from the CSV files and the manual recording. A total of 10 distinct labels are identified, as detailed in Tables 3, 4, 5, and 6. Table 3 lists the different label sets. The Tables 4, 5, and 6 show the class numbers and description for all the label sets.

The next step involved removing the features that did not show a variation in value throughout the entire recording, leaving 170 features. This number of features is considerably higher than in previous studies, due to the inclusion of separate drive and tool information. In total, 52,026 data rows are available. The features consist of the following:

Door, rapid traverse, program, and coolant status

Feed rate, positions, position errors for each axis

Total running time and program running time

Program line content, number, and program path with program name

Spindle speed

Current, torque, modulation depth, temperature, active power, circuit voltage, speed for each axis

Tool information

There are seven axes in this machine, namely X, Y, Z, B, C (see above), the spindle, and the tool change system. A detailed description of the features with datatype, value range, and unit can be found on GitHub6.

Table 7 shows the start and end timestamps of each changeover matrix.

Table 8 shows all the tools used in manufacturing the three products. The last three tools were not used for production, only ‘BLUM_REINIGUNGSKOPF’ was used once to clean the manufacturing chamber due to heavy soiling.

The data is stored in one file with a recording frequency of 1s and can be found in a Zenodo repository3.

Product A is the last product that was produced before each changeover matrix starts. The production sequence for each changeover matrix is as follows: B, C, B, A, C, A.

The Data Records are structured as a table and stored as a CSV file, using a comma as the separator. The first two columns contain an index and timestamp formatted as YYYY-MM-DD HH:MM:SS. The following 170 columns contain all the features, with decimal points denoted by a period (‘.’). Boolean values are represented by ‘True’ and ‘False’. The last ten columns contain the labels as numerical values. Each of these label columns corresponds to a specific label set explained in Section ‘Data pre-processing’.

Table 9 contains a short description of the external material stored on GitHub6. A flow chart of the entire changeover process is available there, as well as the NC-code for the three manufactured products, detailed information for the used tools listed in Table 8 and a list of interruptions (see also Section ‘Data irregularities’ below) and a feature list with full description (see also Section ‘Data pre-processing’ above). A Jupyter notebook shows how to use the data set including basic pre-processing (see also Section ‘Code example’ below).

In previous research, the authors worked on detecting sub-phases of the changeover process10. It turned out that the more sub-phases are detected, the worse the machine learning algorithms performed. With a growing number of sub-phases to be detected, the amount of data in the single classes for the learning process became imbalanced. Imbalance is a known problem for machine learning algorithms, therefore the imbalance of the data set3 will be evaluated in the next section.

The aim of the data set3 presented was also to improve the statistical validity by repeating the experiments. Although laboratory engineers were trained in advance, a ‘learning curve’ effect can be noticed in the data. This effect is analyzed in the related section.

During previous research, the authors showed that data labeling can affect the machine learning approach, as algorithms can be affected by imbalanced class counts11. Depending on the number of data points in the class, algorithms may not be able to accurately separate these classes from others. In the following, the word phase is used synonymously with class.

Figure 4 shows the frequencies of the individual phases for the approach with two changeover phases (Label_03). It can be seen that there are more than twice as many changeover phases as production phases in the data set3. Due to limited resources, only one series product was manufactured on the machine during the ‘Production’ phase. This is equivalent to a manufacturing order lot size of ‘1’. As machine production is deterministic, it is possible to duplicate the data points of the ‘Production’ phases depending on the desired batch size and supplement the data set3 accordingly. In addition to the oversampling approach described, other techniques, such as cost-sensitive classifiers, can be used12.

Occurrences in 2-phases (Label_03).

Figure 5 shows the frequencies of the individual phases for the approach with 12 changeover phases (Label_08). It also shows that changeover phases occur more frequently than production phases (odd numbers: changeover phases, even numbers: production phases). This discrepancy is particularly clear in phases 2, 4, 6, and 8, which describe the production phases of products A and B. These are lower than the number of production phases for product C. Over all recordings, there are an average of 1,246 data points including interruptions for each production process of product C, there are 238 counts for product B and 83 occurrences for product A. Since, as described above, only one product could be manufactured after the completion of each changeover process, this number of production phases is an expected result. Another reason for the high number of changeover phases in the data set3 is that, as shown in Fig. 1, the ramp-up phase, i.e. the production of the first product, is considered part of the setup phase.

Occurences in 12-phases (Label_08).

Figure 6 shows the frequencies of the individual phases for the approach with 43 changeover phases (Label_10). Phases that are only consisting of opening and closing the door have few occurrences (Phases 1, 12, 14, 16, 17, 19, 21, 23, 43). Running the NC-code during changeover (Phase 26) and Production (Phase 40) have the most occurrences (approx. 15,000).

Occurences in 43-phases (Label_10).

Figure 7 shows the frequencies of the individual phases for the approach with 73 changeover phases (Label_01). As with the occurrences of the 43-phase approach (Label_10), the door opening and closing phases have few occurrences, especially phases 26 and 28 as they only occur for changeover activities where product C was produced before. This is because there are more tools needed to produce product C and all tools that are not needed for the new product are removed from the machine.

Occurences in 73-phases (Label_01).

Figure 8 shows the durations of the changeover processes for each complete changeover matrix. The durations of the individual changeover combination are shown for each changeover matrix. It can be seen that the changeover processes in changeover matrix ‘1’ consistently have the longest durations, except for the changeover process ‘A to B’. There are two possible explanations:

Learning curve effect.

While executing the first changeover matrix, there was a longer interruption during the changeover from ‘B to C’, and also from ‘A to C’ strongly increasing the total durations.

As the number of repetitions of performed manufacturing operations increases, a so-called ‘learning curve’ emerges. This phenomenon, also known as ‘learning-by-doing’, describes how performance is continuously improved through repeated execution of a work step13 (p. 423). The term ‘experience curve effect’ is also used in the literature to describe the potential to reduce unit costs by 20 to 30 percent for every doubling of the cumulative production volume. In addition to technological advances, this is also based on learning effects that lead to increases in productivity14(p. 115 f.). For changeover, this means that the employee becomes increasingly familiar with the processes with each changeover process carried out, which shortens the changeover time and can extend the production time.

A reduction of changeover times from changeover matrix ‘1’ to ‘2’ can be clearly identified for the changeovers ‘C to B’ and ‘B to A’ and slightly for ‘C to A’ (see Fig. 8). Despite prior instruction and training, the laboratory engineer was not yet fully familiar with the procedures. One reason may be that, unlike experienced industrial workers, laboratory engineers at universities do not carry out changeover activities daily.

If the data set3 is to be used for modeling changeover times, the authors recommend not using the first and possibly also the second changeover matrix or correcting the data set using interpolation techniques. As the ‘learning curve effect’ is relevant and known, the relevant data was left unchanged in the data set3.

As all interruptions are labeled in the data set3, it is also possible to exclude interruptions during pre-processing. Please see Label_06, and Label_07 and the corresponding description of data irregularities in the GitHub6 material.

The data set3 is available on the Zenodo platform. External material is available on the GitHub platform6. The license for the data set3 and all external material is Creative Commons Attribution 4.0 International15. Researchers are free to share and adapt the presented data set, but they must give credit to the authors with a reference, provide a link to the license, and indicate changes to the original data set3 and other additional material.

In previous research, the authors have worked on different application examples with an older data set7 (see Table 1):

An application example is forecasting the energy demand of basic G-commands from the NC-code. Here, Latin Hypercube Sampling is used as an efficient method of Design of Experiments to train a machine learning model for the forecasting16.

Another example of the application of the data set3 is the automatic detection of human changeover activity from the NC data. No additional manual feedback from the operator is needed. Various machine learning model were analyzed in their ability to detect changeovers accurately. It was also analyzed if sub-phases of the changeover process can be detected only from the machine’s NC data. Different machine learning techniques were evaluated, like random forests or neural networks10.

This automatic changeover detection can also be accomplished by applying time series machine learning techniques to classify phases of the machine changeover17.

Although the authors applied an existing data set7, the application examples are valid for the new data set3 as well. The authors are currently working on validating the previous research results with the new data set3. The application example of classifying Label_03 with machine learning was chosen as a code example applying the new data set3 (please see Section ‘Code example’).

Although thoroughly planned and executed, even in laboratory production, deviations from the optimal manufacturing process can occur. Deviations usually occur in the production process where people are involved. In the case of this sample production, this is the changeover process. The subsequent production process, on the other hand, runs in the milling machine using a deterministic NC program and is not subject to human influence if the NC program has been tested in advance and is within verified machine production parameters.

The changeover processes of the machine for the production of the three products were summarized and abstracted in a standard changeover process. A description of this changeover standard is published on GitHub6. The faults in the setup process were documented, and the data points were clearly labeled accordingly. The documentation of the individual deviations and the names of the corresponding labels are also uploaded to GitHub6.

As deviations from planned procedures also occur in series production, where they are also seen as a source of potential improvements18, it was decided to leave the faults in the data set3. It is recommended that the user either correct the deviations or filter them out of the data set3 using the corresponding labels.

The Jupyter notebook included in the external materials serves as an example of applying machine learning techniques to the data set3. Initially, the data is imported, and the boolean and string columns are converted into numerical formats. Missing values are replaced by zeros. Subsequently, the data set3 is partitioned into training and testing subsets and standardized. The final step involves training a Random Forest model and assessing its performance on the test set6.

The Python code for simple preprocessing and training a Random Forest can be found in a Jupyter notebook on GitHub: https://github.com/ElMoe/Production-Data-Set-for-Five-Axis-CNC-Milling-with-Multiple-Changeovers.

Ahlfeld, M. et al. Fokusthema: Daten im Kontext von Industrie 4.0. In Bundesministerium für Wirtschaft und Energie (BMWi) (ed.) Fokusthema: Daten im Kontext von Industrie 4.0 (Spreedruck Berlin GmbH, 2016).

Schmitt, A.-M. & Engelmann, B. A series production data set for five-axis cnc milling. Data 9, https://doi.org/10.3390/data9050066 (2024).

Engelmann, B., Schmitt, A.-M. & Martinez, M. Production data set for five-axis cnc milling with multiple changeovers. Zenodo, V1.0.0, https://doi.org/10.5281/zenodo.14094887 (2024).

Störmer, O. & Stowasser, S. Industrial Engineering - Standardmethoden zur Produktivitätssteigerung und Prozessoptimierung (Carl Hanser Verlag GmbH & Company KG, 2015).

Böge, A. et al. Handbuch Maschinenbau: Grundlagen und Anwendungen der Maschinenbau-Technik https://books.google.de/books?id=KRaSzQEACAAJ (Springer Fachmedien Wiesbaden, 2021).

Engelmann, B. & Schmitt, A.-M. ElMoe/Production-Data-Set-for-Five-Axis-CNC-Milling-with-Multiple-Changeovers. https://github.com/ElMoe/Production-Data-Set-for-Five-Axis-CNC-Milling-with-Multiple-Changeovers (2025).

Engelmann, B. & Schmitt, A.-M. Series production data set for 5-axis cnc milling. Zenodo, V1.0.0 https://doi.org/10.5281/zenodo.10853254 (2024).

SAP AG. Master data for production planning. https://help.sap.com/doc/saphelp_scm700_ehp02/7.0.2/en-US/8e/4ec95360267614e10000000a174cb4/frameset.htm [Online; accessed 23-September-2024] (2024).

SPINNER Werkzeugmaschinenfabrik GmbH. Universal- Bearbeitungszentrum U3/U4/U5-620/1520 V2 mit Steuerung Siemens 840Dsl. SPINNER Werkzeugmaschinenfabrik GmbH.

Engelmann, B., Schmitt, A.-M., Theilacker, L. & Schmitt, J. Implications from legacy device environments on the conceptional design of machine learning models in manufacturing. Journal of Manufacturing and Materials Processing 8, https://doi.org/10.3390/jmmp8010015 (2024).

Engelmann, B. et al. Detecting changeover events on manufacturing machines with machine learning and nc data. Applied Artificial Intelligence 38, 2381317, https://doi.org/10.1080/08839514.2024.2381317 (2024).

Article Google Scholar

Haixiang, G. et al. Learning from class-imbalanced data: Review of methods and applications. Expert systems with applications 73, 220–239 (2017).

Article Google Scholar

Glock, C. H., Grosse, E. H., Jaber, M. Y. & Smunt, T. L. Applications of learning curves in production and operations management: A systematic literature review. Computers & Industrial Engineering 131, 422–441 (2019).

Article Google Scholar

Gabath, C. Gewinngarant Einkauf: Nachhaltige Kostensenkung ohne Personalabbau (Gabler Verlag, 2008). https://books.google.de/books?id=lPo7U1MEOZIC.

Creative Commons. Cc by 4.0 deed attribution 4.0 international. https://creativecommons.org/licenses/by/4.0/deed.en (2024). [Online; accessed 21-March-2024].

Schmitt, A.-M., Miller, E., Engelmann, B., Batres, R. & Schmitt, J. G-code evaluation in cnc milling to predict energy consumption through machine learning. Advances in Industrial and Manufacturing Engineering 8, 100140 (2024).

Article Google Scholar

Schmitt, A.-M., Antonov, A., Schmitt, J. & Engelmann, B. Classification of production process phases with multivariate time series techniques. In 2024 22nd International Conference on Research and Education in Mechatronics (REM), 210–217 (IEEE, 2024).

Productivity Press Development Team. Standard Work for the Shopfloor. The Shopfloor Series. https://books.google.de/books?id=ye7gwAEACAAJ (Taylor & Francis, 2002).

Mclntosh, R., Culley, S., Mileham, A. & Owen, G. Improving Changeover Performance (Butterworth-Heinemann, 2001).

Download references

The authors gratefully thank the team of the Machine Laboratory of the Faculty of Mechanical Engineering for their contributions to the research. The publication is supported by the publication fund of the Technical University of Applied Sciences Würzburg-Schweinfurt.

Open Access funding enabled and organized by Projekt DEAL.

Technical University of Applied Sciences Würzburg-Schweinfurt, Institute of Digital Engineering (IDEE), Schweinfurt, 97421, Germany

Mario Martinez, Anna-Maria Schmitt, Andreas Schiffler & Bastian Engelmann

Search author on:PubMed Google Scholar

Search author on:PubMed Google Scholar

Search author on:PubMed Google Scholar

Search author on:PubMed Google Scholar

Conceptualization and Methodology: M.M., A.-M.S., and B.E.; Software: A.-M.S., and A.S.; Hardware: A.S.; Validation: A.-M.S., and B.E.; Formal analysis: B.E., and A.-M.S.; Resources, B.E.; Data curation: A.-M.S.; Writing—original draft preparation: M.M., A.-M.S., and B.E.; Writing—review and editing, A.-M.S., and B.E.; Visualization: M.M., and A.-M.S.; Supervision and Project Administration: B.E.; Funding acquisition: B.E. All authors reviewed the manuscript.

Correspondence to Bastian Engelmann.

The authors declare no competing interests.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

Martinez, M., Schmitt, AM., Schiffler, A. et al. Production Data Set for five-Axis CNC Milling with multiple Changeovers. Sci Data 12, 1067 (2025). https://doi.org/10.1038/s41597-025-05294-0

Download citation

Received: 18 November 2024

Accepted: 22 May 2025

Published: 23 June 2025

DOI: https://doi.org/10.1038/s41597-025-05294-0

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

SHARE