Welcome to the session program for Community Over Code EU 2024.

If you prefer, you can also see it in the alternate format or as a list.

Filter by track

All

Tuesday June 4, 2024

09:00
09:00 - 09:15.
by Ryan Skraba
Track: Keynote
Room: Melody
Opening Remarks for Community over Code EU 2024.
09:15
09:15 - 09:45.
by Ruth Ikegah
Track: Keynote
Room: Melody
The rapid growth of the global open source community has led to the expansion of numerous projects, including the establishment of chapters in diverse regions such as Africa. This talk will explore the unique experiences and insights gained from leading an African chapter of the CHAOSS project, highlighting both the challenges faced and the victories achieved along the way. It will discuss the growth of the open source movement in Africa and emphasize the importance of building a diverse and inclusive community.
09:50
09:50 - 10:20.
by Asim Hussain
Track: Keynote
Room: Melody
In the realm of sustainability, grassroots initiatives often emerge as powerful catalysts for change, driven by the collective wisdom of practitioners. Our organization, a coalition of hundreds of software practitioners, embodies this ethos, operating on the principles of consensus and practical action. The result? Tangible solutions that directly foster meaningful change. Enter Impact Framework, an open-source tool designed to quantify the environmental impact of software. It takes observations you can easily gather from running systems such as CPU utilisation, page views, installs, prompts and induces them into environmental impacts like carbon, waste, water.
10:25
10:25 - 11:10
Morning break & Poster sessions
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10
11:10 - 11:40.
by Mallikarjun Venkataswamyreddy
Track: Big Data Storage
Room: Melody
Apache HBase is an open-source non-relational distributed database with multiple components such as Zookeeper, JournalNodes, Hmaster, Namenodes, Datanodes, Regionserver. Managing independent clusters for each use case is operationally heavy and sub-optimal utilization of hardware. Hence there is a need for providing a consolidated, managed, multi-tenant HBase cluster with stronger isolation guarantees in many organizations. In this talk, we are going to talk about how we approached this problem, made tradeoffs and run large scale multi-tenant hbase clusters with strict isolation guarantees.
11:10 - 11:40.
by Tomas Ferreiro
Track: Fintech
Room: Symphony
This session explores Fineract’s impact on banking transformation in fintech. It analyzes motivators driving core banking system changes, addressing challenges and innovative solutions. From a client-focused view, it details how Fineract addresses banking sector needs, emphasizing adaptability and strategic advantages globally. Real success cases and their metrics will demonstrate Fineract’s positive influence, driving innovation across financial landscapes. It also discusses regional fintech challenges and the potential solutions with Fineract as a fundamental piece.
11:10 - 11:40.
by Paul Brebner
Track: Data Engineering
Room: Rhapsody
When I started as the Instaclustr Technology Evangelist 7 years ago, I already had a background in computer science R&D and thought I knew a few things about architecting complex distributed systems. But it was still challenging to learn multiple new Apache (and other) Big Data technologies and build and scale realistic demonstration applications for domains such as IoT/logistics, fintech, anomaly detection, geospatial data, data pipelines and a drone delivery application - with streaming machine learning.
11:10 - 11:40.
by Etienne Studer
Track: Observability
Room: Mirror Lounge
With more than 300 ASF projects being built thousands of times by developers and CI machines every day, making informed decisions about where to put the attention to accelerate build and test feedback cycles and increase the stability of the build process requires deep and holistic build data from which actionable insights can be derived. You will learn how Develocity aggregates the build data captured from dozens of Apache projects and >30k builds every week, surfacing surprising and interesting insights about how these projects are built and how the building of the software can be improved.
11:50
11:50 - 12:20.
by Anton Okolnychyi
Track: Big Data Storage
Room: Melody
A critical aspect of any table format is the rapid identification of files relevant for a query irrespective of the underlying data volume. The focus of this presentation is on the job planning process in Apache Iceberg, highlighting its efficiency and ability to scale to tens of millions of files. This session will explain how the project leverages a hybrid strategy for planning jobs, seamlessly transitioning between local and distributed execution for optimal performance.
11:50 - 12:20.
by Karin Safra
Track: Fintech
Room: Symphony
Uncover the pivotal role of a Data Science Product Manager as they conduct a data-driven symphony in a high-volume Fintech environment. In the world of product management, the role of a Data Science Product Manager stands out as a conductor orchestrating a symphony of insights. Join me in this session as I share firsthand experiences from my journey as a Data Science Product Manager at PayPal, delving into the challenges, successes, and failures that have shaped my approach to leading products in a data-rich environment.
11:50 - 12:20.
by Justin Mclean
Track: Data Engineering
Room: Rhapsody
Welcome to a presentation on Gravitino! Managing metadata can be complex and time-consuming, but Gravitino offers the ultimate solution. It provides a single source of truth for multi-regional data with geo-distributed architecture support. This allows you to store and manage your data in one place, accessible from anywhere globally. With unified data and AI asset management, you get centralized security and data access management, making data protection easier. Gravitino helps you focus more on your data by simplifying tasks and offering these benefits:
11:50 - 12:20.
by Brian Proffitt
Track: Community
Room: Mirror Lounge
One of the mainstays of the open source ecosystem are community events. Open Source Summit, All Things Open, Community Over Code… all examples of community events with vitality and influence within open source. But unlike more commercially focused events, community events are not as simple to measure in terms of benefits to organizations that participate. Without sales leads or conversions, how does a commercial organization measure the gains of participation? And for community projects, what’s the return on investment in running a booth or giving talks at such events?
12:30
12:30 - 13:00.
by Marco Sinhoreli
Track: CloudStack
Room: Melody
In this session, we will explore the potential of migrating from VMware to Apache CloudStack with KVM. VMware vSphere is a robust cloud infrastructure and management solution that combines vSphere and vRealize Suite, providing automation and operations capabilities for traditional and modern infrastructure and apps. However, the transition to Apache CloudStack can offer enhanced profitability and competitiveness. We will delve into the benefits of Apache CloudStack, including its cost-effectiveness and open-source nature, and discuss how a gradual migration from VMware vCloud can reduce ownership costs, increase profitability, and enhance competitiveness.
12:30 - 13:00.
by David Higgins
Track: Fintech
Room: Symphony
The Digital Public Infrastructure movement has been gaining momentum globally as governments move to DPI-based approaches to create exponential societal outcomes within and across sectors. DPI is composed of open, interoperable technology with transparent, accountable, and participatory governance frameworks to unlock innovation and value at scale. This session will introduce how Apache projects like Fineract recognized as Digital Public Goods are having transformative impact on achieving SDGs. Through presentation of the work Mifos has been undertaking over the past 12 months we will show how capabilities have been enhanced in Payment Hub EE combined with the power of Fineract to cover new use cases of P2G, Voucher Management and Account Mapping.
12:30 - 13:00.
by Jan Lukavský
Track: Data Engineering
Room: Rhapsody
This session will introduce a platform created to bridge the existing gaps in data management while removing some of the complexities in existing Big Data ecosystem. The platform is built around a comprehensive data model describing structured entities and their relations. The model is consistently applied across three abstract types of storages - streaming (e.g. Apache Kafka, Google Cloud PubSub), batch (e.g. Hadoop HDFS, S3, Google Cloud Storage) and random-access (e.
12:30 - 13:00.
by JB Onofré
Track: Community
Room: Mirror Lounge
Open-source technology is fundamentally collaborative and transparent in nature, especially thanks to Apache projects and communities. It fosters innovation, flexibility, and community-driven development for more robust and accessible solutions. Learn how the Dremio Unified Analytics Platform can be a core part of your open source data strategy. We’ll review the role of open-source technologies in shaping modern data strategies and the benefits they offer. We’ll also learn how Dremio harnesses open-source tools, including its Apache Iceberg native data catalog that uses Project Nessie, and its foundational use Apache Arrow for in-memory analytics and Apache Arrow Flight for high-performance data transfer.
13:00
13:00 - 14:00
Lunch
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00
14:00 - 14:30.
by Daniel Augusto Veronezi Salvador, Bryan Lima, João Jandre Paraquetti & Rafael Weingärtner
Track: CloudStack
Room: Melody
Apache CloudStack (ACS) is a solid option among known cloud orchestration systems, being on the same level as OpenStack, Azure Stack, and others. All of them address the basic needs to create and run a private cloud system; however, ACS’s users have to adopt external solutions for rating/billing the resources consumption, which is native in the other orchestration tools (e.g. OpenStack). This presentation will address the design and efforts of the ACS community to implement a native rating feature that will allow more flexibility and reduce the need for external systems.
14:00 - 14:30.
by Aleksandar Vidakovic
Track: Fintech
Room: Symphony
Apache Fineract has a wide range of built-in features, but most companies that integrate Fineract into their applications and services still require some customization of existing functionality or add new features. The usual approach is to fork the upstream project on Github and start right away editing the original code. This approach has a couple of drawbacks, especially that after a while of development the customization gets so complex that pulling changes from the upstream repository makes Git conflicts more likely and contributions back to the upstream project very difficult.
14:00 - 14:30.
by Riccardo Amadio
Track: Data Engineering
Room: Rhapsody
In this talk, I’ll walk you through the tricks and best practices to take your data pipeline game to the next level. No boring theory here - we’ll be talking real-world use cases. Exploring which are the patterns for data pipeline with Airflow+Spark, Airflow+DBT, Airflow+Polars, how to avoid dependencies management on Airflow and resuse DAGs template on our organization. Define which are the fundamental concepts of a Data Pipeline, from Data Lineage, Data Observability, Metadata, Data quality, Data auditing and how to integrate it on a Data Pipeline.
14:00 - 14:30.
by Michael Rambichler
Track: API & Microservices
Room: Mirror Lounge
Apache Camel leads a seamless transition, taking control of 1000+ interfaces from Oracle SOA Suite. Over the last two years, we have driven forward the integration of all retail systems from a centralised and proprietary system into a microservice-oriented architecture based on Apache Camel and Openshift. The previously centralised gateways are now independent interfaces. The challenge here was to lift the countless proprietary implementations to a system that is open to all.
14:40
14:40 - 15:10.
by Andrija Panic
Track: CloudStack
Room: Melody
CloudStack recently introduced a few hypervisor migration features, to help cloud operators migrate existing VM workloads into CloudStack. In this session, we are going to see how you can migrate instances from external KVM hosts to KVM hosts managed by CloudStack. Also, we are going to see how we can quickly deploy an instance from a previously prepared qcow2 image.
14:40 - 15:10.
by Adam Saghy
Track: Fintech
Room: Symphony
Since the first repayment strategy got introduced, many followed, but there was one thing common in them: They were hard coding the allocation rules for each transaction type. By introducing - part of the 1.9.0 release - the “Advanced payment allocation” the idea was to have a repayment strategy which was: Supporting dynamic configuration of the allocation rules for transaction types Supporting configuration of more fine-grained allocation rules for future installments
14:40 - 15:10.
by Martin Desruisseaux
Track: Data Engineering
Room: Rhapsody
Geospatial data are ubiquitous, but the difficulty of handling them accurately is often under-estimated. Various projects implement their own routines for performing geospatial operations, but not always with awareness about the pitfalls of simple approaches. This talk will present some of the difficulties in mapping “real world” to digital data. Then we will present some international standards published jointly by the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO).
14:40 - 15:10.
by Alexandre Gallice
Track: API & Microservices
Room: Mirror Lounge
Apache Camel is the proven integration swiss knife for years. In today’s world of workloads moving to the cloud, the need for disparate systems to communicate remains more than ever. This context makes a Kubernetes Java stack like Quarkus a good fit to implement Camel routes. In this session, the attendance can first expect a quick reminder about Camel Quarkus basics. Beyond, some day to day useful features will be presented via concrete examples.
15:20
15:20 - 15:50.
by João Jandre Paraquetti, Daniel Augusto Veronezi Salvador, Bryan Lima & Rafael Weingärtner
Track: CloudStack
Room: Melody
Apache CloudStack (ACS) and KVM are a combination that many organizations decided to adopt. KVM is a widely used hypervisor with a vibrant community and support in different operating system distributions. While developing the KVM plugin functionalities, one normally tries to make use of the full potential of the hypervisor; however, while Libvirt, the toolkit used by ACS to manage KVM VMs, already supports native incremental snapshots, every volume snapshot/backup taken with ACS is a full snapshot/backup.
15:20 - 15:50.
by Zoltan Mezei
Track: Fintech
Room: Symphony
In this presentation we delve into infrastructure optimization options for supporting the scalability of Fineract. Key highlights of the session include: Performance testing: Exploring the newly-introduced capabilities of Fineract that enable drilling down to performance bottlenecks during development and in production. Performance improvements: Showing infrastructure and configuration changes that can improve Fineract’s response times and throughput under high-load scenarios. Scalability improvements: Presenting improvements on Fineract’s scalability capabilities, focusing on infrastructure-based scaling velocity improvements.
15:20 - 15:50.
by Simhadri Govindappa & Attila Turóczy
Track: Big Data Compute
Room: Rhapsody
The session will start by covering the latest developments made in hive-iceberg and followed by an overview of the work done to seamlessly integrate Hive and Iceberg. Along with a deep dive into the various cool features supported by hive-iceberg , ranging from statistics, branching tagging, compactions, concurrency and much more.
15:20 - 15:50.
by Dominik Jelinek
Track: API & Microservices
Room: Mirror Lounge
Apache Camel is the leading open-source integration framework that simplifies the integration of various systems and applications. There exists a comprehensive set of Tooling specifically designed to empower Camel developers in their work with Apache Camel within VS Code. These tools facilitate a seamless and efficient development experience, offering robust support and functionalities tailored to the needs of Camel developers. In my session I would like to rely on the Extension Pack for Apache Camel which contains a set of specific extensions for Camel but also leverages the VS Code ecosystem.
15:50
15:50 - 16:10
Afternoon break
16:10

Melody

Symphony

Rhapsody

Mirror Lounge

16:10
16:10 - 16:40.
by Wei Zhou
Track: CloudStack
Room: Melody
In this session Wei will present how CloudStack 4.19 adds the capability to easily and quickly perform a light-touch integration of networking appliances with Apache CloudStack, allowing for operators and end users to offer a broader range of networking services while empowering end-users to effortlessly deploy their own virtualized network functions (VNFs).
16:10 - 16:40.
by Nadia Jiang
Track: Incubator
Room: Symphony
Q&A is one of the most effective ways to obtain knowledge, build connections, and create interaction. In open-source communities, Q&A is particularly crucial. It not only provides a platform for users and developers to collaboratively tackle technical issues and clarify uncertainties but also enhances the sharing and circulation of knowledge. By helping each other in resolving issues, community members forge stronger bonds and jointly advance their projects. Additionally, a robust Q&A system attracts new members, injecting fresh perspectives and energy into the community.
16:10 - 16:40.
by Luciano Resende & Hongyue Zhang
Track: Big Data Compute
Room: Rhapsody
This session explores the integrated use of Apache Toree, YuniKorn, Spark, and Airflow to create efficient, scalable data pipelines. We will start by discussing how Apache Toree provides an interactive analysis environment with Spark via Jupyter Notebook. Then, we’ll discuss using Apache YuniKorn to manage and schedule these computational resources, ensuring system efficiency. Central to our talk, we’ll delve into the role of Apache Spark in large-scale data processing, highlighting its integration with Toree and YuniKorn.
16:10 - 16:40.
by Addie Girouard
Track: Community
Room: Mirror Lounge
Collaborative governance in software is challenging. This presentation focuses on stakeholder participation which seems limited to those with the technical acumen, tooling expertise, and positions of influence. Yet, evidence shows that great collaboration is dependent on quality divergent thinking balanced with quality convergent thinking. This presentation lays out a strategic framework that curates broader participation by leveraging a landscape of networks and communication channels. Governance in software development tends to exclude valuable insights from individuals outside the technical sphere.
16:50
16:50 - 17:20.
by Alexandre Mattioli
Track: CloudStack
Room: Melody
Apache CloudStack integrates with two major SDN solutions, Tungsten Fabric (OpenSDN) for KVM environments and NSX for VMWare ESX environments. In this talk we’ll explore how this integrations were implemented, how to setup ACS Zones with these SDNs and explore their capabilities in regards to ACS.
16:50 - 17:20.
by Craig Russell
Track: Incubator
Room: Symphony
to submit patches to a podling? to release code to the public? to maintain trademarks for a podling? to become a committer on a podling? This talk explains what common barriers are to accomplishing objectives of people and projects. It explains why The ASF has: licensing requirements for code submissions and releases, signing and checksums, download protocols, voting requirements for releases and project membership, trademark requirements for web sites and documentation.
16:50 - 17:20.
by Csaba Ringhofer & Daniel Becker
Track: Big Data Compute
Room: Rhapsody
Reading file formats efficiently is a crucial part of big data systems - in selective scans data is often only big before hitting the first filter and becomes manageable during the rest of the processing. The talk describes this early stage of query execution in Apache Impala, from reading the bytes of Parquet files on the filesystem to applying predicates and runtime filters on individual rows. Apache Impala is a distributed massively parallel analytic query engine written in C++ and Java.
16:50 - 17:20.
by Aparna Sundar
Track: Community
Room: Mirror Lounge
In this session, I share best practices on the way to create bar raising documentation to guide users to use Figma and GitHub templates. To scale best practices in UX Research, designers of open source software create various design artifacts that can help software builders use and improve on the open source code and curated experience offerings. In this talk, I offer examples of OpenSearch research processes that can scale, documentation and creation of templates that designers and developers in the open source community can utilize in developing experiences for their users.
17:30
17:30
17:30 - 18:30
Birds of a Feather

Wednesday June 5, 2024

09:00
09:00 - 09:15.
by David Nalley
Track: Keynote
Room: Melody
09:20
09:20 - 09:50.
by Dirk-Willem van Gulik
Track: Keynote
Room: Melody
Software has matured and is now an integral, key, part of society, its infrastructure and economy. Yet, by and large, the industries stance on security, reliability and preventing data leaks has fallen way behind. We’re regularly front-page news. So - like all important engineering industries before it - that means that politicians all over the world have started to care. And are introducing software regulation. Europe leads that pack with the, now final, Cyber Resilience Act and the Product Liability Directive.
09:55
09:55 - 10:25.
by Sherae Daniel
Track: Keynote
Room: Melody
The path to successful progression through the ranks of an open-source community remains unclear. Historically, the quality and quantity of one’s technical skills have been essential components in progressing through the ranks in OSS communities. Because participants conduct much of this work in coding repositories, the demonstration of technical skills drives outcomes. However, given that individuals do not typically meet face to face, as they would in a conventional organisational setting, various on-line impression management techniques such as self-promotion (i.
10:25
10:25 - 11:10
Morning break & Poster sessions
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10
11:10 - 11:40.
by Bertrand Delacretaz
Track: Community
Room: Melody
The Asynchronous Decision Making techniques commonly used in open source projects enable efficient remote collaboration, in teams which have no boss, no schedule and often no cultural consistency yet produce world-changing software. These very efficient collaboration techniques can even work without computers and apply to most types of projects, not just software development. This talk describes the key elements and tools of the Asynchronous Decision Making process, based on more than twenty five years of experience in Open Source projects, as well as examples from federated governments, which, interestingly, work in a similar way.
11:10 - 11:40.
by Marton Balassi & Peter Vary
Track: Performance Engineering
Room: Symphony
11:10 - 11:40.
by Aliaksandr Sheliustin
Track: Data Engineering
Room: Rhapsody
In this insightful presentation, Aliaksandr will unveil four ingenious tricks to maximize your Apache Airflow experience in the realm of data engineering. Starting with the power of leveraging CSV files to effortlessly create versatile DAGs, Aliaksandr will demonstrate how this flexibility can streamline your pipeline development process. Moving forward, the audience will learn how Google Sheets can be harnessed as a dynamic tool for DAG creation, opening up opportunities for collaboration among team members of varying Airflow proficiency levels.
11:10 - 11:40.
by Jean-frederic Clere
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
As HTTP/3 looks ready we will look to where we are with it in our servers. The “old” HTTP/2 protocol and the corresponding TLS/SSL are common to Traffic Server, HTTP Server and Tomcat. The presentation will shortly explain the new protocol and look to different implementation of the protocol. Then the state of HTTP/3 in our 3 servers and how to implement HTTP/3 in them will be presented. A small demo supporting HTTP/3 will be run.
11:50
11:50 - 12:20.
by Rich Bowen
Track: Community
Room: Melody
For those of us who already know how important open source is, it can be challenging to persuasively make the case to management, because we assume that everyone already knows the basics. This can work against us, confusing our audience and making us come across as condescending or concerned about irrelevant lofty philosophical points. In this talk, we take it back to the basics. What does management actually need to know about open source, why it matters, and how to make decisions about consuming open source, contributing to open source, and open sourcing company code?
11:50 - 12:20.
by David Kjerrumgaard
Track: Performance Engineering
Room: Symphony
For over a decade, Apache Zookeeper has played a crucial role in maintaining configuration information and providing synchronization within distributed systems. Its unique ability to provide these features made it the de facto standard for distributed systems within the Apache community. Despite its prolific adoption, there is an emerging trend toward eliminating the dependency on Zookeeper altogether and replacing it with an alternative technology. The most notable example is the KRaft subproject within the Apache Kafka community,
11:50 - 12:20.
by Subham Rakshit
Track: Data Engineering
Room: Rhapsody
11:50 - 12:20.
by Mark Thomas
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
Apache Tomcat implements the Jakarta Servlet, Jakarta Pages, Jakarta Expression Language, Jakarta WebSocket and Jakarta Authentication specifications. Jakarta EE 11 is due for release in the first half of 2024 with the first stable Tomcat 11 release expected shortly afterwards. This session will look at the changes in Jakarta EE 11 for the specifications that Tomcat implements and what these changes mean for developers looking to deploying their application on Tomcat 11.
12:30
12:30 - 13:00.
by Gabor Kaszab & Zoltan Borok-Nagy
Track: Performance Engineering
Room: Symphony
Apache Impala is a distributed massively parallel query engine designed for high-performance querying of large-scale data. There has been a long list of new features recently around supporting Apache Iceberg tables such as reading, writing, time traveling, and so on. However, in a big data environment it is also a must to be performant. Since Impala has been designed to be fast, it has its own way of reading Iceberg tables.
12:30 - 13:00.
by Gyula Fora & Attila Mészáros
Track: Data Engineering
Room: Rhapsody
12:30 - 13:00.
by Shu Kit Chan
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
The WebAssembly (Wasm) plugin for Apache Traffic Server (ATS) allows WebAssembly modules following the “proxy-wasm” specification to be run on ATS. The talk will begin by first introducing the background and history of plugins and programmability of ATS. I will go over the short comings of the current offerings and then introduce the Wasm plugin as an alternative solution for them. I will then talk about the “proxy-wasm” specification, which describes the support of WebAssembly modules for proxy server software.
13:00
13:00 - 14:00
Lunch
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00
14:00 - 14:30.
by Brian Proffitt
Track: Community
Room: Melody
There are millions of open source projects people can use and contribute to. Why yours? Developing an open source project that is valuable to many and widely accepted in an industry requires a lot of care and feeding – and more than just code. Whether your project is brand new or been around for decades, you need to explain why other people should take the time to learn, use, and potentially contribute to it.
14:00 - 14:30.
by Paul Brebner
Track: Performance Engineering
Room: Symphony
Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs.
14:00 - 14:30.
by Jarek Potiuk
Track: Data Engineering
Room: Rhapsody
Apache Airflow relies on a silent symphony behind the scenes: its CI/CD (Continuous Integration/Continuous Delivery) and development tooling. This presentation explores the critical role these tools play in keeping Airflow efficient and innovative. We’ll delve into how robust CI/CD ensures bug fixes and improvements are seamlessly integrated, while well-maintained development tools empower developers to contribute effectively. Airflow’s power comes from a well-oiled machine – its CI/CD and development tools. This presentation dives into the world of these often-overlooked heroes.
14:00 - 14:30.
by Remy Maucherat & Jean-frederic Clere
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
This session explores the use of the FFM API from Java 22 to leverage native library capabilities, in the context of Apache Tomcat. OpenSSL is here being used to provide support for TLS through the JSSE API, without the need to use the tomcat-native wrapper library. Exploratory design of QUIC and HTTP/3 support from OpenSSL 3.3+ is also discussed.
14:40
14:40 - 15:10.
by Greg Brown
Track: Community
Room: Melody
How do you explain your Apache project to people who don’t even know how to download apps onto their phones – and still manage to get them excited about what you’re working on? It’s simple: pretend you’re talking about a movie. The problem isn’t the project, but how we’ve been talking about them. And now we’re going to fix that. In this talk, discover how to completely change the narrative about discussing Apache and open source by not actually talking about open source or Apache…but instead using the same principles that marketers use to create excitement around a movie.
14:40 - 15:10.
by Gabor Somogyi
Track: Big Data Compute
Room: Symphony
14:40 - 15:10.
by Hongyue Zhang & Luciano Resende
Track: Data Engineering
Room: Rhapsody
Data quality plays a crucial role in data engineering to enable efficient and insightful data pipelines at scale. In this session, we will leverage Apache Iceberg as the scalable table format with ACID guarantee, Apache Toree’s interactive computation capabilities and orchestrate the automated data workflow on Apache Airflow. We will start by talking about how iceberg can use its column level statistics stored in metadata for efficient and reliable data quality validation.
14:40 - 15:10.
by Paul King
Track: Groovy
Room: Mirror Lounge
This talk looks at using Groovy for a well-known data-science problem: classifying Iris flowers. It involves solving this problem using the latest deep-learning neural network technologies and has the option of using GraalVM for blazing speed. Groovy provides a data-science environment with the simplicity of Python but using Java-like syntax and your favourite JVM technologies.
15:20
15:20 - 15:50.
by Edith Puclla
Track: Community
Room: Melody
In this presentation, we will delve into the important role that Apache Airflow plays in the Outreachy program and its broader influence in closing inclusion gaps within the open source community. We will explore the success stories and transformative experiences of Outreachy contributors, emphasizing how this open source project has created opportunities for people from diverse backgrounds. Our discussion will focus on the power of open source initiatives like Apache Airflow to foster a more inclusive and accessible technology ecosystem.
15:20 - 15:50.
by Hichem Kenniche
Track: Big Data Compute
Room: Symphony
As machine learning (ML) models increasingly become integral components of modern applications, there is a growing need to deploy them in real-time environments. Apache Spark is a popular open-source framework for large-scale data processing that supports ML tasks, while Kubernetes provides a powerful platform for container orchestration and deployment. However, combining Spark and Kubernetes poses significant challenges, especially when it comes to achieving low latency and high scalability. In this session, we explore optimal approaches for real-time ML with Apache Spark on Kubernetes, including best practices and strategies for efficient model training, deployment, and serving.
15:20 - 15:50.
by Christina Lin
Track: Data Engineering
Room: Rhapsody
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
15:20 - 15:50.
by Sergio del Amo
Track: Groovy
Room: Mirror Lounge
In this session, Sergio del Amo introduces the Micronaut® framework and demonstrates how the Framework’s unique compile-time approach enables the development of ultra-lightweight Java applications. Compelling aspects of the Micronaut framework include: Develop applications with Java, Kotlin, or Apache Groovy Sub-second startup time Small processes that can run in as little as 10 MB of JVM heap No runtime reflection Dependency injection and AOP Reflection-free serialization A database access toolkit that uses ahead-of-time (AoT) compilation to pre-compute queries for repository interfaces.
16:00
16:00 - 16:30.
by Niklas Merz
Track: Community
Room: Melody
It takes a village to run an open source project successfully. A village is usually run by its citizens and governed by some elected officials. In open source we call the citizens “users” and the people in charge of a project “maintainers”. To understand the health and sustainability of a project we should take a closer look at the community and not necessarily the code in the first place. To understand their demographics a village can run a census.
16:00 - 16:30.
by Muhammet Orazov
Track: Big Data Compute
Room: Rhapsody
16:00 - 16:30.
by Paul King
Track: Groovy
Room: Mirror Lounge
Calling all developers with a penchant for fine whiskey! Join Dr. Paul King, VP at Apache Groovy, on a quest to analyze whiskeys produced by the world’s top 86 distilleries to identify the perfect single-malt Scotch. How will he perform this analysis? By using the traditional and distributed K-means clustering algorithm from various Apache projects. Bottoms up!
16:45
16:45 - 17:20
Lightning Talks & Wrap-up