Program

Welcome to the session program for Community Over Code EU 2024.

If you prefer, you can also see it in the alternate format or as a list.

Filter by track

All

Tuesday June 4, 2024

09:00

09:00 - 09:15.

Opening Remarks Day 2

by Ryan Skraba

Track: Keynote

Room: Melody

Opening Remarks for Community over Code EU 2024.

Full details

09:15

09:15 - 09:45.

From Local Roots to Global Impact: Building an Inclusive Open Source Community in Africa

by Ruth Ikegah

Track: Keynote

Room: Melody

The rapid growth of the global open source community has led to the expansion of numerous projects, including the establishment of chapters in diverse regions such as Africa. This talk will explore the unique experiences and insights gained from leading an African chapter of the CHAOSS project, highlighting both the challenges faced and the victories achieved along the way. It will discuss the growth of the open source movement in Africa and emphasize the importance of building a diverse and inclusive community.

Full details

09:50

09:50 - 10:20.

Doing for sustainability, what open source did for software

by Asim Hussain

Track: Keynote

Room: Melody

In the realm of sustainability, grassroots initiatives often emerge as powerful catalysts for change, driven by the collective wisdom of practitioners. Our organization, a coalition of hundreds of software practitioners, embodies this ethos, operating on the principles of consensus and practical action. The result? Tangible solutions that directly foster meaningful change. Enter Impact Framework, an open-source tool designed to quantify the environmental impact of software. It takes observations you can easily gather from running systems such as CPU utilisation, page views, installs, prompts and induces them into environmental impacts like carbon, waste, water.

Full details

10:25

10:25 - 11:10

Morning break & Poster sessions

11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10

11:10 - 11:40.

Multi Tenancy — Apache Hbase

by Mallikarjun Venkataswamyreddy

Track: Big Data Storage

Room: Melody

Apache HBase is an open-source non-relational distributed database with multiple components such as Zookeeper, JournalNodes, Hmaster, Namenodes, Datanodes, Regionserver. Managing independent clusters for each use case is operationally heavy and sub-optimal utilization of hardware. Hence there is a need for providing a consolidated, managed, multi-tenant HBase cluster with stronger isolation guarantees in many organizations. In this talk, we are going to talk about how we approached this problem, made tradeoffs and run large scale multi-tenant hbase clusters with strict isolation guarantees.

Full details

11:10 - 11:40.

Evolving Fintech Landscapes: Driving Business Innovation with Fineract

by Tomas Ferreiro

Track: Fintech

Room: Symphony

This session explores Fineract’s impact on banking transformation in fintech. It analyzes motivators driving core banking system changes, addressing challenges and innovative solutions. From a client-focused view, it details how Fineract addresses banking sector needs, emphasizing adaptability and strategic advantages globally. Real success cases and their metrics will demonstrate Fineract’s positive influence, driving innovation across financial landscapes. It also discusses regional fintech challenges and the potential solutions with Fineract as a fundamental piece.

Full details

11:10 - 11:40.

Architecting Applications With Multiple Open-Source Big Data Technologies

by Paul Brebner

Track: Data Engineering

Room: Rhapsody

When I started as the Instaclustr Technology Evangelist 7 years ago, I already had a background in computer science R&D and thought I knew a few things about architecting complex distributed systems. But it was still challenging to learn multiple new Apache (and other) Big Data technologies and build and scale realistic demonstration applications for domains such as IoT/logistics, fintech, anomaly detection, geospatial data, data pipelines and a drone delivery application - with streaming machine learning.

Full details

11:10 - 11:40.

Apache Build Observability

by Etienne Studer

Track: Observability

Room: Mirror Lounge

With more than 300 ASF projects being built thousands of times by developers and CI machines every day, making informed decisions about where to put the attention to accelerate build and test feedback cycles and increase the stability of the build process requires deep and holistic build data from which actionable insights can be derived. You will learn how Develocity aggregates the build data captured from dozens of Apache projects and >30k builds every week, surfacing surprising and interesting insights about how these projects are built and how the building of the software can be improved.

Full details

11:50

11:50 - 12:20.

Apache Iceberg Planning Explained

by Anton Okolnychyi

Track: Big Data Storage

Room: Melody

A critical aspect of any table format is the rapid identification of files relevant for a query irrespective of the underlying data volume. The focus of this presentation is on the job planning process in Apache Iceberg, highlighting its efficiency and ability to scale to tens of millions of files. This session will explain how the project leverages a hybrid strategy for planning jobs, seamlessly transitioning between local and distributed execution for optimal performance.

Full details

11:50 - 12:20.

Data Symphony: Orchestrating Insights in Fintech

by Karin Safra

Track: Fintech

Room: Symphony

Uncover the pivotal role of a Data Science Product Manager as they conduct a data-driven symphony in a high-volume Fintech environment. In the world of product management, the role of a Data Science Product Manager stands out as a conductor orchestrating a symphony of insights. Join me in this session as I share firsthand experiences from my journey as a Data Science Product Manager at PayPal, delving into the challenges, successes, and failures that have shaped my approach to leading products in a data-rich environment.

Full details

11:50 - 12:20.

Gravitino: A multi-regional, geo-distributed meta datalake

by Justin Mclean

Track: Data Engineering

Room: Rhapsody

Welcome to a presentation on Gravitino! Managing metadata can be complex and time-consuming, but Gravitino offers the ultimate solution. It provides a single source of truth for multi-regional data with geo-distributed architecture support. This allows you to store and manage your data in one place, accessible from anywhere globally. With unified data and AI asset management, you get centralized security and data access management, making data protection easier. Gravitino helps you focus more on your data by simplifying tasks and offering these benefits:

Full details

11:50 - 12:20.

Measuring the Impact of Community Events

by Brian Proffitt

Track: Community

Room: Mirror Lounge

One of the mainstays of the open source ecosystem are community events. Open Source Summit, All Things Open, Community Over Code… all examples of community events with vitality and influence within open source. But unlike more commercially focused events, community events are not as simple to measure in terms of benefits to organizations that participate. Without sales leads or conversions, how does a commercial organization measure the gains of participation? And for community projects, what’s the return on investment in running a booth or giving talks at such events?

Full details

12:30

12:30 - 13:00.

Transitioning from VMware to Apache CloudStack: A Path to Profitability and Competitiveness

by Marco Sinhoreli

Track: CloudStack

Room: Melody

In this session, we will explore the potential of migrating from VMware to Apache CloudStack with KVM. VMware vSphere is a robust cloud infrastructure and management solution that combines vSphere and vRealize Suite, providing automation and operations capabilities for traditional and modern infrastructure and apps. However, the transition to Apache CloudStack can offer enhanced profitability and competitiveness. We will delve into the benefits of Apache CloudStack, including its cost-effectiveness and open-source nature, and discuss how a gradual migration from VMware vCloud can reduce ownership costs, increase profitability, and enhance competitiveness.

Full details

12:30 - 13:00.

Increasing Adoption and Achieving Financial Inclusion through DPI

by David Higgins

Track: Fintech

Room: Symphony

The Digital Public Infrastructure movement has been gaining momentum globally as governments move to DPI-based approaches to create exponential societal outcomes within and across sectors. DPI is composed of open, interoperable technology with transparent, accountable, and participatory governance frameworks to unlock innovation and value at scale. This session will introduce how Apache projects like Fineract recognized as Digital Public Goods are having transformative impact on achieving SDGs. Through presentation of the work Mifos has been undertaking over the past 12 months we will show how capabilities have been enhanced in Payment Hub EE combined with the power of Fineract to cover new use cases of P2G, Voucher Management and Account Mapping.

Full details

12:30 - 13:00.

Enhancing Flexibility and Productivity with Access Patterns and Storage-Agnostic Abstractions

by Jan Lukavský

Track: Data Engineering

Room: Rhapsody

This session will introduce a platform created to bridge the existing gaps in data management while removing some of the complexities in existing Big Data ecosystem. The platform is built around a comprehensive data model describing structured entities and their relations. The model is consistently applied across three abstract types of storages - streaming (e.g. Apache Kafka, Google Cloud PubSub), batch (e.g. Hadoop HDFS, S3, Google Cloud Storage) and random-access (e.

Full details

12:30 - 13:00.

Building an open source data strategy thanks to Apache projects with Dremio

by JB Onofré

Track: Community

Room: Mirror Lounge

Open-source technology is fundamentally collaborative and transparent in nature, especially thanks to Apache projects and communities. It fosters innovation, flexibility, and community-driven development for more robust and accessible solutions. Learn how the Dremio Unified Analytics Platform can be a core part of your open source data strategy. We’ll review the role of open-source technologies in shaping modern data strategies and the benefits they offer. We’ll also learn how Dremio harnesses open-source tools, including its Apache Iceberg native data catalog that uses Project Nessie, and its foundational use Apache Arrow for in-memory analytics and Apache Arrow Flight for high-performance data transfer.

Full details

13:00

13:00 - 14:00

Lunch

11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00

14:00 - 14:30.

Making Apache CloudStack market ready with a native rating solution

by Daniel Augusto Veronezi Salvador, Bryan Lima, João Jandre Paraquetti & Rafael Weingärtner

Track: CloudStack

Room: Melody

Apache CloudStack (ACS) is a solid option among known cloud orchestration systems, being on the same level as OpenStack, Azure Stack, and others. All of them address the basic needs to create and run a private cloud system; however, ACS’s users have to adopt external solutions for rating/billing the resources consumption, which is native in the other orchestration tools (e.g. OpenStack). This presentation will address the design and efforts of the ACS community to implement a native rating feature that will allow more flexibility and reduce the need for external systems.

Full details

14:00 - 14:30.

How to extend Fineract with custom modules

by Aleksandar Vidakovic

Track: Fintech

Room: Symphony

Apache Fineract has a wide range of built-in features, but most companies that integrate Fineract into their applications and services still require some customization of existing functionality or add new features. The usual approach is to fork the upstream project on Github and start right away editing the original code. This approach has a couple of drawbacks, especially that after a while of development the customization gets so complex that pulling changes from the upstream repository makes Git conflicts more likely and contributions back to the upstream project very difficult.

Full details

14:00 - 14:30.

Modern Data Orchestrators

by Riccardo Amadio

Track: Data Engineering

Room: Rhapsody

In this talk, I’ll walk you through the tricks and best practices to take your data pipeline game to the next level. No boring theory here - we’ll be talking real-world use cases. Exploring which are the patterns for data pipeline with Airflow+Spark, Airflow+DBT, Airflow+Polars, how to avoid dependencies management on Airflow and resuse DAGs template on our organization. Define which are the fundamental concepts of a Data Pipeline, from Data Lineage, Data Observability, Metadata, Data quality, Data auditing and how to integrate it on a Data Pipeline.

Full details

14:00 - 14:30.

Apache Camel Takes Charge of Over 1000 Interfaces in a Seamless Transition from Oracle SOA Suite

by Michael Rambichler

Track: API & Microservices

Room: Mirror Lounge

Apache Camel leads a seamless transition, taking control of 1000+ interfaces from Oracle SOA Suite. Over the last two years, we have driven forward the integration of all retail systems from a centralised and proprietary system into a microservice-oriented architecture based on Apache Camel and Openshift. The previously centralised gateways are now independent interfaces. The challenge here was to lift the countless proprietary implementations to a system that is open to all.

Full details

14:40

14:40 - 15:10.

Importing KVM instances from external KVM hosts or QCOW2

by Andrija Panic

Track: CloudStack

Room: Melody

CloudStack recently introduced a few hypervisor migration features, to help cloud operators migrate existing VM workloads into CloudStack. In this session, we are going to see how you can migrate instances from external KVM hosts to KVM hosts managed by CloudStack. Also, we are going to see how we can quickly deploy an instance from a previously prepared qcow2 image.

Full details

14:40 - 15:10.

Fineract - Advanced payment allocation

by Adam Saghy

Track: Fintech

Room: Symphony

Since the first repayment strategy got introduced, many followed, but there was one thing common in them: They were hard coding the allocation rules for each transaction type. By introducing - part of the 1.9.0 release - the “Advanced payment allocation” the idea was to have a repayment strategy which was: Supporting dynamic configuration of the allocation rules for transaction types Supporting configuration of more fine-grained allocation rules for future installments

Full details

14:40 - 15:10.

Apache SIS library for geospatial applications

by Martin Desruisseaux

Track: Data Engineering

Room: Rhapsody

Geospatial data are ubiquitous, but the difficulty of handling them accurately is often under-estimated. Various projects implement their own routines for performing geospatial operations, but not always with awareness about the pitfalls of simple approaches. This talk will present some of the difficulties in mapping “real world” to digital data. Then we will present some international standards published jointly by the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO).

Full details

14:40 - 15:10.

Integration in the cloud era with Camel Quarkus

by Alexandre Gallice

Track: API & Microservices

Room: Mirror Lounge

Apache Camel is the proven integration swiss knife for years. In today’s world of workloads moving to the cloud, the need for disparate systems to communicate remains more than ever. This context makes a Kubernetes Java stack like Quarkus a good fit to implement Camel routes. In this session, the attendance can first expect a quick reminder about Camel Quarkus basics. Beyond, some day to day useful features will be presented via concrete examples.

Full details

15:20

15:20 - 15:50.

Incremental snapshots/backups in KVM plugin of Apache CloudStack

by João Jandre Paraquetti, Daniel Augusto Veronezi Salvador, Bryan Lima & Rafael Weingärtner

Track: CloudStack

Room: Melody

Apache CloudStack (ACS) and KVM are a combination that many organizations decided to adopt. KVM is a widely used hypervisor with a vibrant community and support in different operating system distributions. While developing the KVM plugin functionalities, one normally tries to make use of the full potential of the hypervisor; however, while Libvirt, the toolkit used by ACS to manage KVM VMs, already supports native incremental snapshots, every volume snapshot/backup taken with ACS is a full snapshot/backup.

Full details

15:20 - 15:50.

Fineract scalability improvements

by Zoltan Mezei

Track: Fintech

Room: Symphony

In this presentation we delve into infrastructure optimization options for supporting the scalability of Fineract. Key highlights of the session include: Performance testing: Exploring the newly-introduced capabilities of Fineract that enable drilling down to performance bottlenecks during development and in production. Performance improvements: Showing infrastructure and configuration changes that can improve Fineract’s response times and throughput under high-load scenarios. Scalability improvements: Presenting improvements on Fineract’s scalability capabilities, focusing on infrastructure-based scaling velocity improvements.

Full details

15:20 - 15:50.

Hive-Iceberg - Breaking the Ice: A Closer Look at Hive-Iceberg

by Simhadri Govindappa & Attila Turóczy

Track: Big Data Compute

Room: Rhapsody

The session will start by covering the latest developments made in hive-iceberg and followed by an overview of the work done to seamlessly integrate Hive and Iceberg. Along with a deep dive into the various cool features supported by hive-iceberg , ranging from statistics, branching tagging, compactions, concurrency and much more.

Full details

15:20 - 15:50.

Managing the Camel development lifecycle within VS Code

by Dominik Jelinek

Track: API & Microservices

Room: Mirror Lounge

Apache Camel is the leading open-source integration framework that simplifies the integration of various systems and applications. There exists a comprehensive set of Tooling specifically designed to empower Camel developers in their work with Apache Camel within VS Code. These tools facilitate a seamless and efficient development experience, offering robust support and functionalities tailored to the needs of Camel developers. In my session I would like to rely on the Extension Pack for Apache Camel which contains a set of specific extensions for Camel but also leverages the VS Code ecosystem.

Full details

15:50

15:50 - 16:10

Afternoon break

16:10

Melody

Symphony

Rhapsody

Mirror Lounge

16:10

16:10 - 16:40.

VNF Integration and Support in Apache CloudStack

by Wei Zhou

Track: CloudStack

Room: Melody

In this session Wei will present how CloudStack 4.19 adds the capability to easily and quickly perform a light-touch integration of networking appliances with Apache CloudStack, allowing for operators and end users to offer a broader range of networking services while empowering end-users to effortlessly deploy their own virtualized network functions (VNFs).

Full details

16:10 - 16:40.

Enhancing Your Community Engagement with Apache Answer

by Nadia Jiang

Track: Incubator

Room: Symphony

Q&A is one of the most effective ways to obtain knowledge, build connections, and create interaction. In open-source communities, Q&A is particularly crucial. It not only provides a platform for users and developers to collaboratively tackle technical issues and clarify uncertainties but also enhances the sharing and circulation of knowledge. By helping each other in resolving issues, community members forge stronger bonds and jointly advance their projects. Additionally, a robust Q&A system attracts new members, injecting fresh perspectives and energy into the community.

Full details

16:10 - 16:40.

Orchestrating Scalable Data Pipelines with Apache Toree, YuniKorn, Spark, and Airflow

by Luciano Resende & Hongyue Zhang

Track: Big Data Compute

Room: Rhapsody

This session explores the integrated use of Apache Toree, YuniKorn, Spark, and Airflow to create efficient, scalable data pipelines. We will start by discussing how Apache Toree provides an interactive analysis environment with Spark via Jupyter Notebook. Then, we’ll discuss using Apache YuniKorn to manage and schedule these computational resources, ensuring system efficiency. Central to our talk, we’ll delve into the role of Apache Spark in large-scale data processing, highlighting its integration with Toree and YuniKorn.

Full details

16:10 - 16:40.

Breaking Barriers: A Strategic Framework for Collaborative Governance in Software

by Addie Girouard

Track: Community

Room: Mirror Lounge

Collaborative governance in software is challenging. This presentation focuses on stakeholder participation which seems limited to those with the technical acumen, tooling expertise, and positions of influence. Yet, evidence shows that great collaboration is dependent on quality divergent thinking balanced with quality convergent thinking. This presentation lays out a strategic framework that curates broader participation by leveraging a landscape of networks and communication channels. Governance in software development tends to exclude valuable insights from individuals outside the technical sphere.

Full details

16:50

16:50 - 17:20.

SDN Options in Apache CloudStack

by Alexandre Mattioli

Track: CloudStack

Room: Melody

Apache CloudStack integrates with two major SDN solutions, Tungsten Fabric (OpenSDN) for KVM environments and NSX for VMWare ESX environments. In this talk we’ll explore how this integrations were implemented, how to setup ACS Zones with these SDNs and explore their capabilities in regards to ACS.

Full details

16:50 - 17:20.

Why is it So Hard...

by Craig Russell

Track: Incubator

Room: Symphony

to submit patches to a podling? to release code to the public? to maintain trademarks for a podling? to become a committer on a podling? This talk explains what common barriers are to accomplishing objectives of people and projects. It explains why The ASF has: licensing requirements for code submissions and releases, signing and checksums, download protocols, voting requirements for releases and project membership, trademark requirements for web sites and documentation.

Full details

16:50 - 17:20.

Anatomy of reading Apache Parquet files (from the Apache Impala perspective)

by Csaba Ringhofer & Daniel Becker

Track: Big Data Compute

Room: Rhapsody

Reading file formats efficiently is a crucial part of big data systems - in selective scans data is often only big before hitting the first filter and becomes manageable during the rest of the processing. The talk describes this early stage of query execution in Apache Impala, from reading the bytes of Parquet files on the filesystem to applying predicates and runtime filters on individual rows. Apache Impala is a distributed massively parallel analytic query engine written in C++ and Java.

Full details

16:50 - 17:20.

Documentation to Use Open Source Design Templates

by Aparna Sundar

Track: Community

Room: Mirror Lounge

In this session, I share best practices on the way to create bar raising documentation to guide users to use Figma and GitHub templates. To scale best practices in UX Research, designers of open source software create various design artifacts that can help software builders use and improve on the open source code and curated experience offerings. In this talk, I offer examples of OpenSearch research processes that can scale, documentation and creation of templates that designers and developers in the open source community can utilize in developing experiences for their users.

Full details

17:30

17:30 - 18:30

Birds of a Feather

Wednesday June 5, 2024

09:00

09:00 - 09:15.

State of the Foundation

by David Nalley

Track: Keynote

Room: Melody

Full details

09:20

09:20 - 09:50.

All your code belongs to Policy Makers, Politicians and the Law

by Dirk-Willem van Gulik

Track: Keynote

Room: Melody

Software has matured and is now an integral, key, part of society, its infrastructure and economy. Yet, by and large, the industries stance on security, reliability and preventing data leaks has fallen way behind. We’re regularly front-page news. So - like all important engineering industries before it - that means that politicians all over the world have started to care. And are introducing software regulation. Europe leads that pack with the, now final, Cyber Resilience Act and the Product Liability Directive.

Full details

09:55

09:55 - 10:25.

To Toot or not to Toot: That is the question

by Sherae Daniel

Track: Keynote

Room: Melody

The path to successful progression through the ranks of an open-source community remains unclear. Historically, the quality and quantity of one’s technical skills have been essential components in progressing through the ranks in OSS communities. Because participants conduct much of this work in coding repositories, the demonstration of technical skills drives outcomes. However, given that individuals do not typically meet face to face, as they would in a conventional organisational setting, various on-line impression management techniques such as self-promotion (i.

Full details

10:25

10:25 - 11:10

Morning break & Poster sessions

11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10

11:10 - 11:40.

Unleash the power of Asynchronous Decision Making

by Bertrand Delacretaz

Track: Community

Room: Melody

The Asynchronous Decision Making techniques commonly used in open source projects enable efficient remote collaboration, in teams which have no boss, no schedule and often no cultural consistency yet produce world-changing software. These very efficient collaboration techniques can even work without computers and apply to most types of projects, not just software development. This talk describes the key elements and tools of the Asynchronous Decision Making process, based on more than twenty five years of experience in Open Source projects, as well as examples from federated governments, which, interestingly, work in a similar way.

Full details

11:10 - 11:40.

Efficient, Low Latency Ingestion to Large Tables via Apache Flink and Apache Iceberg

by Marton Balassi & Peter Vary

Track: Performance Engineering

Room: Symphony

One of the primary challenges of data ingestion is the tradeoff between the latency of data availability for the downstream systems and the extent to which data is optimised for efficient reading. When ingesting continuous incoming data streams with low latency, Apache Flink is a data processing engine that shines. Apache Iceberg is one of the most popular table formats for large tables. To get the best of both worlds, and continuously ingest data and see near real-time changes to tables queried by various engines, tight integration is needed between these two Apache projects.

Full details

11:10 - 11:40.

4 tricks that help you leverage your Airflow pipelines

by Aliaksandr Sheliustin

Track: Data Engineering

Room: Rhapsody

In this insightful presentation, Aliaksandr will unveil four ingenious tricks to maximize your Apache Airflow experience in the realm of data engineering. Starting with the power of leveraging CSV files to effortlessly create versatile DAGs, Aliaksandr will demonstrate how this flexibility can streamline your pipeline development process. Moving forward, the audience will learn how Google Sheets can be harnessed as a dynamic tool for DAG creation, opening up opportunities for collaboration among team members of varying Airflow proficiency levels.

Full details

11:10 - 11:40.

HTTP/3 where are we now? State of the art in our servers.

by Jean-frederic Clere

Track: Tomcat, Httpd and other servers

Room: Mirror Lounge

As HTTP/3 looks ready we will look to where we are with it in our servers. The “old” HTTP/2 protocol and the corresponding TLS/SSL are common to Traffic Server, HTTP Server and Tomcat. The presentation will shortly explain the new protocol and look to different implementation of the protocol. Then the state of HTTP/3 in our 3 servers and how to implement HTTP/3 in them will be presented. A small demo supporting HTTP/3 will be run.

Full details

11:50

11:50 - 12:20.

Talking with management about open source

by Rich Bowen

Track: Community

Room: Melody

For those of us who already know how important open source is, it can be challenging to persuasively make the case to management, because we assume that everyone already knows the basics. This can work against us, confusing our audience and making us come across as condescending or concerned about irrelevant lofty philosophical points. In this talk, we take it back to the basics. What does management actually need to know about open source, why it matters, and how to make decisions about consuming open source, contributing to open source, and open sourcing company code?

Full details

11:50 - 12:20.

Oxia - A Horizontally Scalable Alternative to Apache Zookeeper

by David Kjerrumgaard

Track: Performance Engineering

Room: Symphony

For over a decade, Apache Zookeeper has played a crucial role in maintaining configuration information and providing synchronization within distributed systems. Its unique ability to provide these features made it the de facto standard for distributed systems within the Apache community. Despite its prolific adoption, there is an emerging trend toward eliminating the dependency on Zookeeper altogether and replacing it with an alternative technology. The most notable example is the KRaft subproject within the Apache Kafka community,

Full details

11:50 - 12:20.

Data enrichment patterns with Apache Flink

by Subham Rakshit

Track: Data Engineering

Room: Rhapsody

Data enrichment is a critical step in stream processing. Real-time enrichment of streaming data with contextual information adds missing information, improves accuracy, increases trustworthiness, and facilitates better decision-making. Contextual data can be static or dynamic and obtained in various ways - APIs, databases, files and even as a stream. While there are multiple design patterns to perform data enrichment, it is not always obvious when one pattern is preferred over the other.

Full details

11:50 - 12:20.

Tomcat 11 and Jakarta EE 11

by Mark Thomas

Track: Tomcat, Httpd and other servers

Room: Mirror Lounge

Apache Tomcat implements the Jakarta Servlet, Jakarta Pages, Jakarta Expression Language, Jakarta WebSocket and Jakarta Authentication specifications. Jakarta EE 11 is due for release in the first half of 2024 with the first stable Tomcat 11 release expected shortly afterwards. This session will look at the changes in Jakarta EE 11 for the specifications that Tomcat implements and what these changes mean for developers looking to deploying their application on Tomcat 11.

Full details

12:30

12:30 - 13:00.

Let’s see how fast Impala runs on Iceberg

by Gabor Kaszab & Zoltan Borok-Nagy

Track: Performance Engineering

Room: Symphony

Apache Impala is a distributed massively parallel query engine designed for high-performance querying of large-scale data. There has been a long list of new features recently around supporting Apache Iceberg tables such as reading, writing, time traveling, and so on. However, in a big data environment it is also a must to be performant. Since Impala has been designed to be fast, it has its own way of reading Iceberg tables.

Full details

12:30 - 13:00.

Building a Kubernetes Operator for Apache Flink in Java

by Gyula Fora & Attila Mészáros

Track: Data Engineering

Room: Rhapsody

Managing complex applications such as data processing systems on Kubernetes is a formidable challenge even for the most seasoned engineers. Whether you want to build applications that operate themselves or provision infrastructure from Java code, Kubernetes Operators are the way to go. The Java Operator SDK is a production-ready framework that makes implementing Kubernetes Operators in Java easy. We will give you a run-down on the basics of operators and implementing one from scratch in Java and why this library may be the right choice for your project.

Full details

12:30 - 13:00.

WebAssembly plugin for Apache Traffic Server

by Shu Kit Chan

Track: Tomcat, Httpd and other servers

Room: Mirror Lounge

The WebAssembly (Wasm) plugin for Apache Traffic Server (ATS) allows WebAssembly modules following the “proxy-wasm” specification to be run on ATS. The talk will begin by first introducing the background and history of plugins and programmability of ATS. I will go over the short comings of the current offerings and then introduce the Wasm plugin as an alternative solution for them. I will then talk about the “proxy-wasm” specification, which describes the support of WebAssembly modules for proxy server software.

Full details

13:00

13:00 - 14:00

Lunch

11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00

14:00 - 14:30.

Outreach: The Two-Way Street of Open Source Projects

by Brian Proffitt

Track: Community

Room: Melody

There are millions of open source projects people can use and contribute to. Why yours? Developing an open source project that is valuable to many and widely accepted in an industry requires a lot of care and feeding – and more than just code. Whether your project is brand new or been around for decades, you need to explain why other people should take the time to learn, use, and potentially contribute to it.

Full details

14:00 - 14:30.

Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandaries Explored)

by Paul Brebner

Track: Performance Engineering

Room: Symphony

Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs.

Full details

14:00 - 14:30.

The Silent Symphony: Keeping Airflow's CI/CD and Dev Tools in Tune

by Jarek Potiuk

Track: Data Engineering

Room: Rhapsody

Apache Airflow relies on a silent symphony behind the scenes: its CI/CD (Continuous Integration/Continuous Delivery) and development tooling. This presentation explores the critical role these tools play in keeping Airflow efficient and innovative. We’ll delve into how robust CI/CD ensures bug fixes and improvements are seamlessly integrated, while well-maintained development tools empower developers to contribute effectively. Airflow’s power comes from a well-oiled machine – its CI/CD and development tools. This presentation dives into the world of these often-overlooked heroes.

Full details

14:00 - 14:30.

OpenSSL and QUIC using FFM

by Remy Maucherat & Jean-frederic Clere

Track: Tomcat, Httpd and other servers

Room: Mirror Lounge

This session explores the use of the FFM API from Java 22 to leverage native library capabilities, in the context of Apache Tomcat. OpenSSL is here being used to provide support for TLS through the JSSE API, without the need to use the tomcat-native wrapper library. Exploratory design of QUIC and HTTP/3 support from OpenSSL 3.3+ is also discussed.

Full details

14:40

14:40 - 15:10.

Pretend it’s a Movie: How to Build Excitement for Your Apache Project

by Greg Brown

Track: Community

Room: Melody

How do you explain your Apache project to people who don’t even know how to download apps onto their phones – and still manage to get them excited about what you’re working on? It’s simple: pretend you’re talking about a movie. The problem isn’t the project, but how we’ve been talking about them. And now we’re going to fix that. In this talk, discover how to completely change the narrative about discussing Apache and open source by not actually talking about open source or Apache…but instead using the same principles that marketers use to create excitement around a movie.

Full details

14:40 - 15:10.

Obtain, Distribute and Use Temporary Credentials Automatically in Apache Spark and Apache Flink

by Gabor Somogyi

Track: Big Data Compute

Room: Symphony

The importance of security is increasing sharply nowadays which must be reflected in open source projects. Apache Spark and Apache Flink are two of the most widely used Big Data frameworks which can be used for data processing. Both of them offer dozens of external service connectors where authentication plays an essential role. Each external system does its authentication in a different way but a common framework can be provided to ease the life of developers.

Full details

14:40 - 15:10.

Gatekeep iceberg data quality with Apache Toree and airflow

by Hongyue Zhang & Luciano Resende

Track: Data Engineering

Room: Rhapsody

Data quality plays a crucial role in data engineering to enable efficient and insightful data pipelines at scale. In this session, we will leverage Apache Iceberg as the scalable table format with ACID guarantee, Apache Toree’s interactive computation capabilities and orchestrate the automated data workflow on Apache Airflow. We will start by talking about how iceberg can use its column level statistics stored in metadata for efficient and reliable data quality validation.

Full details

14:40 - 15:10.

Classifying Iris flowers with Groovy, Deep Learning, and GraalVM

by Paul King

Track: Groovy

Room: Mirror Lounge

This talk looks at using Groovy for a well-known data-science problem: classifying Iris flowers. It involves solving this problem using the latest deep-learning neural network technologies and has the option of using GraalVM for blazing speed. Groovy provides a data-science environment with the simplicity of Python but using Java-like syntax and your favourite JVM technologies.

Full details

15:20

15:20 - 15:50.

Empowering Inclusivity: Apache Airflow's Impact on Outreachy

by Edith Puclla

Track: Community

Room: Melody

In this presentation, we will delve into the important role that Apache Airflow plays in the Outreachy program and its broader influence in closing inclusion gaps within the open source community. We will explore the success stories and transformative experiences of Outreachy contributors, emphasizing how this open source project has created opportunities for people from diverse backgrounds. Our discussion will focus on the power of open source initiatives like Apache Airflow to foster a more inclusive and accessible technology ecosystem.

Full details

15:20 - 15:50.

Optimal Approaches for Real-Time Machine Learning with Apache Spark on Kubernetes.

by Hichem Kenniche

Track: Big Data Compute

Room: Symphony

As machine learning (ML) models increasingly become integral components of modern applications, there is a growing need to deploy them in real-time environments. Apache Spark is a popular open-source framework for large-scale data processing that supports ML tasks, while Kubernetes provides a powerful platform for container orchestration and deployment. However, combining Spark and Kubernetes poses significant challenges, especially when it comes to achieving low latency and high scalability. In this session, we explore optimal approaches for real-time ML with Apache Spark on Kubernetes, including best practices and strategies for efficient model training, deployment, and serving.

Full details

15:20 - 15:50.

Harnessing WebAssembly for Real-time Stateless Streaming Pipelines

by Christina Lin

Track: Data Engineering

Room: Rhapsody

Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.

Full details

15:20 - 15:50.

Getting Started with the Micronaut® Framework

by Sergio del Amo

Track: Groovy

Room: Mirror Lounge

In this session, Sergio del Amo introduces the Micronaut® framework and demonstrates how the Framework’s unique compile-time approach enables the development of ultra-lightweight Java applications. Compelling aspects of the Micronaut framework include: Develop applications with Java, Kotlin, or Apache Groovy Sub-second startup time Small processes that can run in as little as 10 MB of JVM heap No runtime reflection Dependency injection and AOP Reflection-free serialization A database access toolkit that uses ahead-of-time (AoT) compilation to pre-compute queries for repository interfaces.

Full details

16:00

16:00 - 16:30.

Community census: Understanding users & maintainer personas with a community survey

by Niklas Merz

Track: Community

Room: Melody

It takes a village to run an open source project successfully. A village is usually run by its citizens and governed by some elected officials. In open source we call the citizens “users” and the people in charge of a project “maintainers”. To understand the health and sustainability of a project we should take a closer look at the community and not necessarily the code in the first place. To understand their demographics a village can run a census.

Full details

16:00 - 16:30.

Understanding stream table duality using Kafka, Flink SQL and Debezium format

by Muhammet Orazov

Track: Big Data Compute

Room: Rhapsody

Kafka Streams, ksqlDB or Flink SQL are popular processing engines that enable us to run SQL queries on top of streaming data. Isn’t it fascinating that we can run SQL queries on top of streaming data as if they were relational tables, or convert a table into a stream of changelog events? This is known as the stream-table duality. In this talk we will try to understand how it works under the hood using Flink SQL, Kafka connector with Debezium JSON/Avro format.

Full details

16:00 - 16:30.

Whisky Clustering with Apache Projects: Groovy, Commons Math, Ignite, Spark, Wayang and Beam

by Paul King

Track: Groovy

Room: Mirror Lounge

Calling all developers with a penchant for fine whiskey! Join Dr. Paul King, VP at Apache Groovy, on a quest to analyze whiskeys produced by the world’s top 86 distilleries to identify the perfect single-malt Scotch. How will he perform this analysis? By using the traditional and distributed K-means clustering algorithm from various Apache projects. Bottoms up!

Full details

16:45

16:45 - 17:20

Lightning Talks & Wrap-up