The HighLoad++ professional conference for developers of high-load systems is the key professional event for everyone involved in the creation of large, heavily-frequented, and complex projects.
Dmitrii Bundin
Dmitrii Bundin
BigData Developer
Increasing file I/O performance for JVM on Linux
Abstract
Analytical platforms process more data and the question about their performance arises more often. Each platform has its own optimization methods and techniques but to create a truly high performance system we must have a deep understanding of all constituent elements of the platform and data flows. The most expensive and frequent operation in the data delivery system is working with the file system and it needs to be optimized. In Dmitry's talk we will consider file I/O optimization methods and data copying and how this can increase data transfer throughput by 20%.
Vladimir Baev
Vladimir Baev
Senior BigData Developer
"Apache Airflow in production: tips and pitfalls."
Abstract
Apache Airflow is a workflow orchestration tool, widely used for building ETL Data Pipelines. Let us start with an overview of the Airflow core concepts and take a more in-depth look from the perspective of production requirements and use-cases. Which Airflow features extremely useful in the production environment, and which could potentially break your system? Besides, during this presentation, we are going to discuss possible extensions and tools, which may simplify the Data Engineer's life.