Welcome to the session program for Community Over Code EU 2024.

If you prefer, you can also see it in the alternate format or as a list.

Filter by track

All

Monday June 3, 2024

09:00
09:20
11:10 - 11:40.
by Ana Jimenez Santamaria, Floor Drees, Natali Vlatko & Mirko Boehm
Track: Keynote
Room: Melody
The panel will discuss how EU legislation affects the daily work of open source operations (upstream contribution to open source projects, open source compliance, etc) focusing on how these laws impact open source professionals working in OSPOs or similar entities. Panelists will cover some of the recent policy updates, the challenges of staying compliant when managing open source contribution and usage within organizations, and their personal experiences in adapting to the changing European regulatory environment.
10:25
10:25 - 11:10
Morning break & Poster sessions
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10
11:10 - 11:40.
by Kanchana Welagedara
Track: Community
Room: Melody
At the Community Over Code Europe 2024, the annual conference of the Apache Software Foundation, join us for an insightful session on understanding the core principle of ‘Community Over Code’. This talk will delve into how this philosophy shapes the foundation’s approach to software development. We’ll explore the significance of prioritizing a collaborative, inclusive community and how this fosters innovation and sustainability in open source projects. Attendees will learn about the practical implications of this ethos in Apache’s day-to-day operations and its impact on the broader open-source ecosystem.
11:10 - 11:40.
by Maxim Muzafarov
Track: Cassandra
Room: Symphony
Monitoring and management go hand in hand in distributed storage systems, and this is true for Apache Cassandra as well. Apache Cassandra has a wide range of monitoring and management API extensions that provide insight into its internal processes and make management operations accessible. This talk will provide an overview of several new initiatives that are closely related from a software developer’s perspective. These initiatives follow the same design principles and their overall direction can be characterized as a strategic shift in both management and monitoring from JMX to CQL.
11:10 - 11:40.
by Julian Feinauer
Track: IoT
Room: Mirror Lounge
Apache IoTDB, a time series database focused on IoT workloads. At the end of 2022 the major version 1.0 was released which containted many major changes, e.g. a completely new cluster module, new export and import options and new APIS to integrate with Apache IoTDB. In 2023 three minor releases, namely 1.1, 1.2 and 1.3 have been released but although they are considered minor releases they contain many new features.
11:50
11:50 - 12:20.
by Niharika Singhal
Track: Community
Room: Melody
The significance of responsible and ethical AI systems has gained immense prominence on the global stage, underscoring the escalating recognition of its far-reaching impact on societies worldwide. Lately, diverse groups and individuals have transitioned from relying solely on Free Software licenses for their projects to pioneering new forms of licensing solutions which impose restrictions related to fields of endeavour, behaviour, community management and commercial practices. This practice has now spilled over to creation of suo moto ethics codes for AI, leading to creation of licenses with restrictive characteristics.
11:50 - 12:20.
by Mick Semb Wever
Track: Cassandra
Room: Symphony
The Accord Consensus Protocol, providing global leaderless single-network-round-trip consensus using commodity clocks. Research from University of Michigan & Apple Inc. introduces ACID-compliant, strict serialisable transactions that can run globally at scale, at high throughput, with low latency. This will be a run-through of the importance of ACID transactions in Apache Cassandra, how previous consensus protocols work, how Accord improves on these to provide its industry leading characteristics.
11:50 - 12:20.
by Szymon Janc
Track: IoT
Room: Mirror Lounge
Apache Mynewt is a community-driven, permissively licensed Open Source initiative for constrained, embedded devices and applications. It provides foundational RTOS, middleware (secure bootloader, filesystem, networking stack, device management) and tooling. This presentation will cover history and overall state of the project, its architecture and selected components (eg Bluetooth stack). Will discuss already available features as well additions planned in near future. Last but not least, state of community will be presented.
12:30
12:30 - 13:00.
by Oleg Nenashev
Track: Community
Room: Melody
Mentorship and outreach programs are often considered as side projects. Although they are a nice way to spend time and have some fun, one may say they rarely add new long-term contributors to your company project or a community. Is it true? Is it even the main goal? Or is it about team bonding and growing new maintainers and community leaders? Let’s talk about organizing mentorship programs so that they help to grow your current community and contributors.
12:30 - 13:00.
by Lindsey Zurovchak
Track: Cassandra
Room: Symphony
Cassandra predicted the fall of Troy and no one heeded her warning. Over the years, we’ve learned a lot at Bloomberg about running Apache Cassandra at scale. In this talk, we’ll discuss some of the mistakes we’ve made using Cassandra, how we found and remedied them, and what you can do to avoid them in the future.
12:30 - 13:00.
by Christofer Dutz
Track: IoT
Room: Mirror Lounge
In industrial automation, the established way of collecting data from industrial equipment has many issues. Issues for which Apache provides a number of great ways to avoid them. In this talk, I want to demonstrate some of the issues I have seen and how we can resolve all of them with combinations of some of the amazing projects we have at Apache: How we can use Apache TsFile to directly collect data on the hardware
13:00
13:00 - 14:00
Lunch
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00
14:00 - 14:30.
by Ahmed Sobeh
Track: Community
Room: Melody
Managing an Open Source Program Office (OSPO) team is undoubtedly a unique experience. In this talk I will discuss the unique challenges and best practices for managing software developers in an OSPO environment. I will cover topics such as managing remote teams, maintaining the career progression of OSPO developers, fostering a culture of collaboration and collaborating with the open source community. I will also explore the role of performance metrics in managing software developers in an OSPO, how to align those metrics withe the goals of the organisation and how to help developers balance the needs of the organisation with the needs of the open source community.
14:00 - 14:30.
by Shailaja Koppu
Track: Cassandra
Room: Symphony
As Cassandra clusters and users onboarding to multi cloud platforms, users may access Cassandra clusters from on-premise and from various cloud platforms. Admins may need to restrict certain users/teams to access from certain IP ranges, aka, CIDR groups. Also, may need to restrict superusers credentials usage from third party clouds. CIDR filtering authorizer provides capability for admins to configure allow or disallow users access from different CIDR groups, which can help in preventing misuse of copied or hacked credentials.
14:00 - 14:30.
by Kyle Hoondert
Track: IoT
Room: Mirror Lounge
Apache Druid is a real-time analytics database built for speed and scale, capable of executing complex queries against billions of rows and getting sub-second answers. Druid thrives on highly concurrent workloads, making it ideal for applications like website clickstream analysis, network performance monitoring, or handling vast IoT metrics. By using pre-aggregated data, lightning-fast columnar storage, and parallel processing we can gain insights in real-time. In this talk, we will share techniques for improving query concurrency and achieving sub-second responses.
14:40
14:40 - 15:10.
by Juan Pablo Flores
Track: Community
Room: Melody
As developer communities grow both in size and number, it can be easy to lose sight of the importance of community health in the development process. Most tech communities guide their community efforts and future work based on NPS scores and feedback forms. Though well intended, these methods do not provide a full view of the current status of their community members or guarantee they are moving towards a direction that benefits their members or fulfills their needs.
14:40 - 15:10.
by Branimir Lambov
Track: Cassandra
Room: Symphony
Choosing a compaction strategy for a Cassandra database has historically been a very difficult problem, where making the wrong choice can have lasting effects on performance while making a change later is a time-consuming and costly process. The Unified Compaction Strategy, introduced with Cassandra 5, is designed to provide a solution to this problem by effectively handling a diverse range of use cases, including those best suited for leveled, tiered, and time-windowed compaction.
14:40 - 15:10.
by Anton Okolnychyi
Track: Big Data Storage
Room: Rhapsody
A critical aspect of any table format is the rapid identification of files relevant for a query irrespective of the underlying data volume. The focus of this presentation is on the job planning process in Apache Iceberg, highlighting its efficiency and ability to scale to tens of millions of files. This session will explain how the project leverages a hybrid strategy for planning jobs, seamlessly transitioning between local and distributed execution for optimal performance.
14:40 - 15:10.
by Lukas Ott
Track: IoT
Room: Mirror Lounge
An update on what’s happened inside the Apache PLC4X over the last year. What we have achieved and what we are planning on doing for the near and not-so-near future. From new protocols, updated APIs, new languages, GUI applications right up to even more supported languages and fully generated driver implementations.
15:20
15:20 - 15:50.
by Justin Mclean & Christofer Dutz
Track: Community
Room: Melody
This talk delves into the inner workings of the Apache Software Foundation board, shedding light on its workings and the responsibilities of its board members. Attendees will gain a comprehensive understanding of the ASF board’s governance structure, decision-making processes, and its crucial role in overseeing one of the world’s largest groups of open-source communities. Drawing from real-life experiences, the speaker will share personal insights, challenges faced, and the rewarding aspects of contributing to ASF’s mission.
15:20 - 15:50.
by Uri Smiley & Alexander Laye
Track: Cassandra
Room: Symphony
Cassandra 5.0 now incorporates vector search capabilities powered by DiskANN, an advanced technology developed by Microsoft Research. In this session, we are will demonstrate the vector search performance of Cassandra 5.0, juxtaposed with other leading databases. Our benchmarking platform will be utilized to assess a variety of key metrics, including I/O performance as well as the precision and recall accuracy of the search results. Furthermore, the session will delve into optimization strategies for Cassandra.
15:20 - 15:50.
by Tsz-Wo Nicholas Sze
Track: Big Data Storage
Room: Rhapsody
Apache Ratis is an open source Java library for the Raft Consensus Protocol. Raft is being used successfully as an alternative to Paxos to implement a consistently replicated log. Raft is proven to be safe and is designed to be simpler to understand. Ratis is a high performance implementation of Raft. Apache Ozone, Apache IoTDB and Alluxio use Apache Ratis for providing high availability and replicating raw data. Ratis implements all the standard Raft features, including leader election, log replication, membership change and log compaction.
15:20 - 15:50.
by Navendu Pottekkat
Track: API & Microservices
Room: Mirror Lounge
Why do you need another API to handle external traffic when you have the stable Kubernetes Ingress API and dozens of implementations? What problems of the Ingress API does the new Gateway API solve? Does this mean the end of the Ingress API? In this short talk, Navendu will answer these questions by exploring how Gateway APIs evolved and solved the shortcomings of the Ingress API with hands-on examples using Apache APISIX.
15:50
15:50 - 16:10
Afternoon break
16:10

Melody

Symphony

Rhapsody

Mirror Lounge

16:10
16:10 - 16:40.
by Trista Pan
Track: Community
Room: Melody
In an increasingly interconnected world, the importance of building diverse and inclusive global communities cannot be overstated. This topic explores the journey of taking a local community and expanding its reach to become a vibrant and diverse global community. By examining strategies, best practices, examples of some successful Chinese opensource communities, this session will provide valuable insights into fostering inclusivity, cultural exchange, and collaboration on a global scale. Participants will gain a deeper understanding of the challenges and opportunities involved in transitioning a local community to the global stage.
16:10 - 16:40.
by Justin Mclean
Track: Incubator
Room: Symphony
Discover the keys to success when releasing a podling within the Apache Incubator. This talk explores the crucial aspects that the incubator PMC looks for in every release, providing practical tips to pass the IPMC vote and move your project closer to graduation. Learn about the latest incubator and ASF policies, recent updates you may have missed, and the legal requirements of open source licenses. Gain insights into assembling your NOTICE and LICENSE files effectively, while understanding the reasoning behind specific practices.
16:10 - 16:40.
by Uma Maheswara Rao Gangumalla & Ritesh Shukla
Track: Big Data Storage
Room: Rhapsody
In the era of explosive data growth, scalability is paramount for any storage solution. This abstract focuses on the scalability aspects of Apache Ozone, a distributed object storage system designed to handle the ever-increasing demands of modern data-intensive applications. The session will commence with an up-to-date overview of Apache Ozone, providing insights into its current state, recent enhancements, and its pivotal role in addressing the evolving needs of organizations. Attendees will gain a comprehensive understanding of how Apache Ozone offers scalable, high-performance, and future-ready solutions tailored to the challenges posed by today’s data-intensive applications.
16:10 - 16:40.
by Nicolas Fränkel
Track: API & Microservices
Room: Mirror Lounge
All mature tech stacks nowadays offer infrastructure-related capabilities, either a standard lib or in 3rd-party libraries, e.g., rate-limiting and authorization. While it’s great to have such features, it’s impossible to audit them easily. You’d need to be familiar with the stack and dive deep into the code. This approach just doesn’t scale, A well-designed system keeps the right feature at the right place. In this talk, I’ll go through all steps toward making your system more easily auditable.
16:50
16:50 - 17:20.
by Kanchana Welagedara
Track: Community
Room: Melody
The session will highlight the strategies employed by Apache to foster a more diverse and inclusive environment, emphasizing the importance of mentorship in nurturing new talents and perspectives. This approach not only enriches the Apache community ecosystem but also ensures that it reflects the wide array of users it serves. Attendees will gain insights into the practical steps for implementing similar programs and the profound impact of inclusivity on technology development.
16:50 - 17:20.
by Mirko Kämpf & Gláucia Esppenchutz
Track: Incubator
Room: Symphony
In the ever-evolving landscape of open source projects, the Apache Software Foundation (ASF) stands at the forefront of innovation and community-driven development. Two of its young projects, Apache Training and Apache Wayang, are working on an exciting journey of expansion and inclusivity. This session is dedicated to showcasing how these projects are opening their doors to a broader audience, including non-technical individuals, thereby fostering a more diverse and robust community, which helps the ASF to continue in solving some of the world’s tech problems by bringing people together.
16:50 - 17:20.
by Zoltan Borok-Nagy, Péter Rózsa & Noémi Pap-Takács
Track: Big Data Storage
Room: Rhapsody
Apache Impala is a distributed, massively parallel query engine for big data. Initially, it focused on fast query execution on top of large datasets that were ingested via long-running batch jobs. The table schema and the ingested data typically remained unchanged, and row-level modifications were impractical to say the least. Today’s expectations for modern data warehouse engines have risen significantly. Users now want to have RDBMS-like capabilities in their data warehouses.
16:50 - 17:20.
by Dennis Kieselhorst
Track: API & Microservices
Room: Mirror Lounge
Years ago the Service-oriented architecture (SOA) architectural style came along with implementations of web services based on standards like the Web Service Description Language (WSDL) and SOAP. Many of these interfaces are still in place as of today as a change requires both provider and all consumers to agree on a new definition and change the implementation (often without any business value). The underlying infrastructure, sometimes based on Enterprise Services Buses (ESB) is however often end-of-life and hard to maintain.
17:30
17:30
17:30 - 18:30
Birds of a Feather

Tuesday June 4, 2024

09:00
09:20
09:20 - 09:50.
by Ruth Ikegah
Track: Keynote
Room: Melody
The rapid growth of the global open source community has led to the expansion of numerous projects, including the establishment of chapters in diverse regions such as Africa. This talk will explore the unique experiences and insights gained from leading an African chapter of the CHAOSS project, highlighting both the challenges faced and the victories achieved along the way. It will discuss the growth of the open source movement in Africa and emphasize the importance of building a diverse and inclusive community.
09:55
09:50 - 10:25.
by Asim Hussain
Track: Keynote
Room: Melody
In the realm of sustainability, grassroots initiatives often emerge as powerful catalysts for change, driven by the collective wisdom of practitioners. Our organization, a coalition of hundreds of software practitioners, embodies this ethos, operating on the principles of consensus and practical action. The result? Tangible solutions that directly foster meaningful change. Enter Impact Framework, an open-source tool designed to quantify the environmental impact of software. It takes observations you can easily gather from running systems such as CPU utilisation, page views, installs, prompts and induces them into environmental impacts like carbon, waste, water.
10:25
10:25 - 11:10
Morning break & Poster sessions
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10
11:10 - 11:40.
by Mallikarjun Venkataswamyreddy
Track: Big Data Storage
Room: Melody
Apache HBase is an open-source non-relational distributed database with multiple components such as Zookeeper, JournalNodes, Hmaster, Namenodes, Datanodes, Regionserver. Managing independent clusters for each use case is operationally heavy and sub-optimal utilization of hardware. Hence there is a need for providing a consolidated, managed, multi-tenant HBase cluster with stronger isolation guarantees in many organizations. In this talk, we are going to talk about how we approached this problem, made tradeoffs and run large scale multi-tenant hbase clusters with strict isolation guarantees.
11:10 - 11:40.
by Tomas Ferreiro
Track: Fintech
Room: Symphony
This session explores Fineract’s impact on banking transformation in fintech. It analyzes motivators driving core banking system changes, addressing challenges and innovative solutions. From a client-focused view, it details how Fineract addresses banking sector needs, emphasizing adaptability and strategic advantages globally. Real success cases and their metrics will demonstrate Fineract’s positive influence, driving innovation across financial landscapes. It also discusses regional fintech challenges and the potential solutions with Fineract as a fundamental piece.
11:10 - 11:40.
by Paul Brebner
Track: Data Engineering
Room: Rhapsody
When I started as the Instaclustr Technology Evangelist 7 years ago, I already had a background in computer science R&D and thought I knew a few things about architecting complex distributed systems. But it was still challenging to learn multiple new Apache (and other) Big Data technologies and build and scale realistic demonstration applications for domains such as IoT/logistics, fintech, anomaly detection, geospatial data, data pipelines and a drone delivery application - with streaming machine learning.
11:10 - 11:40.
by Etienne Studer
Track: Observability
Room: Mirror Lounge
With more than 300 ASF projects being built thousands of times by developers and CI machines every day, making informed decisions about where to put the attention to accelerate build and test feedback cycles and increase the stability of the build process requires deep and holistic build data from which actionable insights can be derived. You will learn how Develocity aggregates the build data captured from dozens of Apache projects and >30k builds every week, surfacing surprising and interesting insights about how these projects are built and how the building of the software can be improved.
11:50
11:50 - 12:20.
by Riccardo Amadio
Track: Big Data Storage
Room: Melody
“In the evolving landscape of data platforms, the decoupling of compute and storage has led to the emergence of open data systems free from vendor constraints. However, this shift towards ““modularity”” brings its own set of challenges. The intricate task of establishing effective access controls within table-format architectures proves to be complex. Despite data residing in the cloud and theoretically accessible from anywhere, the existing friction impedes seamless accessibility. Enter Whitefox: an open-source initiative inspired by the brilliant principles of Delta-Sharing.
11:50 - 12:20.
by Karin Safra
Track: Fintech
Room: Symphony
Uncover the pivotal role of a Data Science Product Manager as they conduct a data-driven symphony in a high-volume Fintech environment. In the world of product management, the role of a Data Science Product Manager stands out as a conductor orchestrating a symphony of insights. Join me in this session as I share firsthand experiences from my journey as a Data Science Product Manager at PayPal, delving into the challenges, successes, and failures that have shaped my approach to leading products in a data-rich environment.
11:50 - 12:20.
by Akshat Mathur
Track: Data Engineering
Room: Rhapsody
With great data comes great responsibilities! Companies of every scale face issues of managing huge amounts of data spread across various platforms, databases, and applications. Data federation offers a solution to this problem by integrating and accessing data from various data sources without the need for complex ETL processes or data duplication. This session will delve into the following key aspects of data federation: Introduction to Data Federation: The problem today?
11:50 - 12:20.
by Brian Proffitt
Track: Community
Room: Mirror Lounge
One of the mainstays of the open source ecosystem are community events. Open Source Summit, All Things Open, Community Over Code… all examples of community events with vitality and influence within open source. But unlike more commercially focused events, community events are not as simple to measure in terms of benefits to organizations that participate. Without sales leads or conversions, how does a commercial organization measure the gains of participation? And for community projects, what’s the return on investment in running a booth or giving talks at such events?
12:30
12:30 - 13:00.
by Marco Sinhoreli
Track: CloudStack
Room: Melody
In this session, we will explore the potential of migrating from VMware to Apache CloudStack with KVM. VMware vSphere is a robust cloud infrastructure and management solution that combines vSphere and vRealize Suite, providing automation and operations capabilities for traditional and modern infrastructure and apps. However, the transition to Apache CloudStack can offer enhanced profitability and competitiveness. We will delve into the benefits of Apache CloudStack, including its cost-effectiveness and open-source nature, and discuss how a gradual migration from VMware vCloud can reduce ownership costs, increase profitability, and enhance competitiveness.
12:30 - 13:00.
by David Higgins
Track: Fintech
Room: Symphony
The Digital Public Infrastructure movement has been gaining momentum globally as governments move to DPI-based approaches to create exponential societal outcomes within and across sectors. DPI is composed of open, interoperable technology with transparent, accountable, and participatory governance frameworks to unlock innovation and value at scale. This session will introduce how Apache projects like Fineract recognized as Digital Public Goods are having transformative impact on achieving SDGs. Through presentation of the work Mifos has been undertaking over the past 12 months we will show how capabilities have been enhanced in Payment Hub EE combined with the power of Fineract to cover new use cases of P2G, Voucher Management and Account Mapping.
14:00 - 14:30.
by Jan Lukavský
Track: Data Engineering
Room: Rhapsody
This session will introduce a platform created to bridge the existing gaps in data management while removing some of the complexities in existing Big Data ecosystem. The platform is built around a comprehensive data model describing structured entities and their relations. The model is consistently applied across three abstract types of storages - streaming (e.g. Apache Kafka, Google Cloud PubSub), batch (e.g. Hadoop HDFS, S3, Google Cloud Storage) and random-access (e.
13:00
13:00 - 14:00
Lunch
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00
14:00 - 14:30.
by Daniel Augusto Veronezi Salvador, Bryan Lima, João Jandre Paraquetti & Rafael Weingärtner
Track: CloudStack
Room: Melody
Apache CloudStack (ACS) is a solid option among known cloud orchestration systems, being on the same level as OpenStack, Azure Stack, and others. All of them address the basic needs to create and run a private cloud system; however, ACS’s users have to adopt external solutions for rating/billing the resources consumption, which is native in the other orchestration tools (e.g. OpenStack). This presentation will address the design and efforts of the ACS community to implement a native rating feature that will allow more flexibility and reduce the need for external systems.
14:00 - 14:30.
by Aleksandar Vidakovic
Track: Fintech
Room: Symphony
Apache Fineract has a wide range of built-in features, but most companies that integrate Fineract into their applications and services still require some customization of existing functionality or add new features. The usual approach is to fork the upstream project on Github and start right away editing the original code. This approach has a couple of drawbacks, especially that after a while of development the customization gets so complex that pulling changes from the upstream repository makes Git conflicts more likely and contributions back to the upstream project very difficult.
14:00 - 14:30.
by Riccardo Amadio
Track: Data Engineering
Room: Rhapsody
In this talk, I’ll walk you through the tricks and best practices to take your data pipeline game to the next level. No boring theory here - we’ll be talking real-world use cases. Exploring which are the patterns for data pipeline with Airflow+Spark, Airflow+DBT, Airflow+Polars, how to avoid dependencies management on Airflow and resuse DAGs template on our organization. Define which are the fundamental concepts of a Data Pipeline, from Data Lineage, Data Observability, Metadata, Data quality, Data auditing and how to integrate it on a Data Pipeline.
14:00 - 14:30.
by Michael Rambichler
Track: API & Microservices
Room: Mirror Lounge
Apache Camel leads a seamless transition, taking control of 1000+ interfaces from Oracle SOA Suite. Over the last two years, we have driven forward the integration of all retail systems from a centralised and proprietary system into a microservice-oriented architecture based on Apache Camel and Openshift. The previously centralised gateways are now independent interfaces. The challenge here was to lift the countless proprietary implementations to a system that is open to all.
14:40
14:40 - 15:10.
by Andrija Panic
Track: CloudStack
Room: Melody
CloudStack recently introduced a few hypervisor migration features, to help cloud operators migrate existing VM workloads into CloudStack. In this session, we are going to see how you can migrate instances from external KVM hosts to KVM hosts managed by CloudStack. Also, we are going to see how we can quickly deploy an instance from a previously prepared qcow2 image.
14:40 - 15:10.
by Adam Saghy
Track: Fintech
Room: Symphony
Since the first repayment strategy got introduced, many followed, but there was one thing common in them: They were hard coding the allocation rules for each transaction type. By introducing - part of the 1.9.0 release - the “Advanced payment allocation” the idea was to have a repayment strategy which was: Supporting dynamic configuration of the allocation rules for transaction types Supporting configuration of more fine-grained allocation rules for future installments
14:40 - 15:10.
by Martin Desruisseaux
Track: Data Engineering
Room: Rhapsody
Geospatial data are ubiquitous, but the difficulty of handling them accurately is often under-estimated. Various projects implement their own routines for performing geospatial operations, but not always with awareness about the pitfalls of simple approaches. This talk will present some of the difficulties in mapping “real world” to digital data. Then we will present some international standards published jointly by the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO).
14:40 - 15:10.
by Alexandre Gallice
Track: API & Microservices
Room: Mirror Lounge
Apache Camel is the proven integration swiss knife for years. In today’s world of workloads moving to the cloud, the need for disparate systems to communicate remains more than ever. This context makes a Kubernetes Java stack like Quarkus a good fit to implement Camel routes. In this session, the attendance can first expect a quick reminder about Camel Quarkus basics. Beyond, some day to day useful features will be presented via concrete examples.
15:20
15:20 - 15:50.
by João Jandre Paraquetti, Daniel Augusto Veronezi Salvador, Bryan Lima & Rafael Weingärtner
Track: CloudStack
Room: Melody
Apache CloudStack (ACS) and KVM are a combination that many organizations decided to adopt. KVM is a widely used hypervisor with a vibrant community and support in different operating system distributions. While developing the KVM plugin functionalities, one normally tries to make use of the full potential of the hypervisor; however, while Libvirt, the toolkit used by ACS to manage KVM VMs, already supports native incremental snapshots, every volume snapshot/backup taken with ACS is a full snapshot/backup.
15:20 - 15:50.
by Zoltan Mezei
Track: Fintech
Room: Symphony
In this presentation we delve into infrastructure optimization options for supporting the scalability of Fineract. Key highlights of the session include: Performance testing: Exploring the newly-introduced capabilities of Fineract that enable drilling down to performance bottlenecks during development and in production. Performance improvements: Showing infrastructure and configuration changes that can improve Fineract’s response times and throughput under high-load scenarios. Scalability improvements: Presenting improvements on Fineract’s scalability capabilities, focusing on infrastructure-based scaling velocity improvements.
15:20 - 15:50.
by Simhadri Govindappa & Attila Turóczy
Track: Big Data Compute
Room: Rhapsody
The session will start by covering the latest developments made in hive-iceberg and followed by an overview of the work done to seamlessly integrate Hive and Iceberg. Along with a deep dive into the various cool features supported by hive-iceberg , ranging from statistics, branching tagging, compactions, concurrency and much more.
15:20 - 15:50.
by Dominik Jelinek
Track: API & Microservices
Room: Mirror Lounge
Apache Camel is the leading open-source integration framework that simplifies the integration of various systems and applications. There exists a comprehensive set of Tooling specifically designed to empower Camel developers in their work with Apache Camel within VS Code. These tools facilitate a seamless and efficient development experience, offering robust support and functionalities tailored to the needs of Camel developers. In my session I would like to rely on the Extension Pack for Apache Camel which contains a set of specific extensions for Camel but also leverages the VS Code ecosystem.
15:50
15:50 - 16:10
Afternoon break
16:10

Melody

Symphony

Rhapsody

Mirror Lounge

16:10
16:10 - 16:40.
by Wei Zhou
Track: CloudStack
Room: Melody
In this session Wei will present how CloudStack 4.19 adds the capability to easily and quickly perform a light-touch integration of networking appliances with Apache CloudStack, allowing for operators and end users to offer a broader range of networking services while empowering end-users to effortlessly deploy their own virtualized network functions (VNFs).
16:10 - 16:40.
by Nadia Jiang
Track: Incubator
Room: Symphony
Q&A is one of the most effective ways to obtain knowledge, build connections, and create interaction. In open-source communities, Q&A is particularly crucial. It not only provides a platform for users and developers to collaboratively tackle technical issues and clarify uncertainties but also enhances the sharing and circulation of knowledge. By helping each other in resolving issues, community members forge stronger bonds and jointly advance their projects. Additionally, a robust Q&A system attracts new members, injecting fresh perspectives and energy into the community.
16:10 - 16:40.
by Luciano Resende & Hongyue Zhang
Track: Big Data Compute
Room: Rhapsody
This session explores the integrated use of Apache Toree, YuniKorn, Spark, and Airflow to create efficient, scalable data pipelines. We will start by discussing how Apache Toree provides an interactive analysis environment with Spark via Jupyter Notebook. Then, we’ll discuss using Apache YuniKorn to manage and schedule these computational resources, ensuring system efficiency. Central to our talk, we’ll delve into the role of Apache Spark in large-scale data processing, highlighting its integration with Toree and YuniKorn.
16:10 - 16:40.
by Addie Girouard
Track: Community
Room: Mirror Lounge
Collaborative governance in software is challenging. This presentation focuses on stakeholder participation which seems limited to those with the technical acumen, tooling expertise, and positions of influence. Yet, evidence shows that great collaboration is dependent on quality divergent thinking balanced with quality convergent thinking. This presentation lays out a strategic framework that curates broader participation by leveraging a landscape of networks and communication channels. Governance in software development tends to exclude valuable insights from individuals outside the technical sphere.
16:50
16:50 - 17:20.
by Alexandre Mattioli
Track: CloudStack
Room: Melody
Apache CloudStack integrates with two major SDN solutions, Tungsten Fabric (OpenSDN) for KVM environments and NSX for VMWare ESX environments. In this talk we’ll explore how this integrations were implemented, how to setup ACS Zones with these SDNs and explore their capabilities in regards to ACS.
16:50 - 17:20.
by Craig Russell
Track: Incubator
Room: Symphony
to submit patches to a podling? to release code to the public? to maintain trademarks for a podling? to become a committer on a podling? This talk explains what common barriers are to accomplishing objectives of people and projects. It explains why The ASF has: licensing requirements for code submissions and releases, signing and checksums, download protocols, voting requirements for releases and project membership, trademark requirements for web sites and documentation.
16:50 - 17:20.
by Csaba Ringhofer & Daniel Becker
Track: Big Data Compute
Room: Rhapsody
Reading file formats efficiently is a crucial part of big data systems - in selective scans data is often only big before hitting the first filter and becomes manageable during the rest of the processing. The talk describes this early stage of query execution in Apache Impala, from reading the bytes of Parquet files on the filesystem to applying predicates and runtime filters on individual rows. Apache Impala is a distributed massively parallel analytic query engine written in C++ and Java.
16:50 - 17:20.
by Aparna Sundar
Track: Community
Room: Mirror Lounge
In this session, I share best practices on the way to create bar raising documentation to guide users to use Figma and GitHub templates. To scale best practices in UX Research, designers of open source software create various design artifacts that can help software builders use and improve on the open source code and curated experience offerings. In this talk, I offer examples of how OpenSearch creates process of research that can scale, the process of documentation and create templates that designers and developers in the open source community can utilize in developing experiences for their users.
17:30
17:30
17:30 - 18:30
Birds of a Feather

Wednesday June 5, 2024

09:00
09:20
09:20 - 09:50.
by Dirk-Willem van Gulik
Track: Keynote
Room: Melody
Software has matured and is now an integral, key, part of society, its infrastructure and economy. Yet, by and large, the industries stance on security, reliability and preventing data leaks has fallen way behind. We’re regularly front-page news. So - like all important engineering industries before it - that means that politicians all over the world have started to care. And are introducing software regulation. Europe leads that pack with the, now final, Cyber Resilience Act and the Product Liability Directive.
09:55
09:55 - 10:25.
by Sherae Daniel
Track: Keynote
Room: Melody
The path to successful progression through the ranks of an open-source community remains unclear. Historically, the quality and quantity of one’s technical skills have been essential components in progressing through the ranks in OSS communities. Because participants conduct much of this work in coding repositories, the demonstration of technical skills drives outcomes. However, given that individuals do not typically meet face to face, as they would in a conventional organisational setting, various on-line impression management techniques such as self-promotion (i.
10:25
10:25 - 11:10
Morning break & Poster sessions
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

11:10
11:10 - 11:40.
by Bertrand Delacretaz
Track: Community
Room: Melody
The Asynchronous Decision Making techniques commonly used in open source projects enable efficient remote collaboration, in teams which have no boss, no schedule and often no cultural consistency yet produce world-changing software. These very efficient collaboration techniques can even work without computers and apply to most types of projects, not just software development. This talk describes the key elements and tools of the Asynchronous Decision Making process, based on more than twenty five years of experience in Open Source projects, as well as examples from federated governments, which, interestingly, work in a similar way.
11:10 - 11:40.
by Marton Balassi & Peter Vary
Track: Performance Engineering
Room: Symphony
11:10 - 11:40.
by Aliaksandr Sheliustin
Track: Data Engineering
Room: Rhapsody
In this insightful presentation, Aliaksandr will unveil four ingenious tricks to maximize your Apache Airflow experience in the realm of data engineering. Starting with the power of leveraging CSV files to effortlessly create versatile DAGs, Aliaksandr will demonstrate how this flexibility can streamline your pipeline development process. Moving forward, the audience will learn how Google Sheets can be harnessed as a dynamic tool for DAG creation, opening up opportunities for collaboration among team members of varying Airflow proficiency levels.
11:10 - 11:40.
by Jean-frederic Clere
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
As HTTP/3 looks ready we will look to where we are with it in our servers. The “old” HTTP/2 protocol and the corresponding TLS/SSL are common to Traffic Server, HTTP Server and Tomcat. The presentation will shortly explain the new protocol and look to different implementation of the protocol. Then the state of HTTP/3 in our 3 servers and how to implement HTTP/3 in them will be presented. A small demo supporting HTTP/3 will be run.
11:50
11:50 - 12:20.
by Rich Bowen
Track: Community
Room: Melody
For those of us who already know how important open source is, it can be challenging to persuasively make the case to management, because we assume that everyone already knows the basics. This can work against us, confusing our audience and making us come across as condescending or concerned about irrelevant lofty philosophical points. In this talk, we take it back to the basics. What does management actually need to know about open source, why it matters, and how to make decisions about consuming open source, contributing to open source, and open sourcing company code?
11:50 - 12:20.
by David Kjerrumgaard
Track: Performance Engineering
Room: Symphony
For over a decade, Apache Zookeeper has played a crucial role in maintaining configuration information and providing synchronization within distributed systems. Its unique ability to provide these features made it the de facto standard for distributed systems within the Apache community. Despite its prolific adoption, there is an emerging trend toward eliminating the dependency on Zookeeper altogether and replacing it with an alternative technology. The most notable example is the KRaft subproject within the Apache Kafka community,
11:50 - 12:20.
by Subham Rakshit
Track: Data Engineering
Room: Rhapsody
11:50 - 12:20.
by Mark Thomas
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
Apache Tomcat implements the Jakarta Servlet, Jakarta Pages, Jakarta Expression Language, Jakarta WebSocket and Jakarta Authentication specifications. Jakarta EE 11 is due for release in the first half of 2024 with the first stable Tomcat 11 release expected shortly afterwards. This session will look at the changes in Jakarta EE 11 for the specifications that Tomcat implements and what these changes mean for developers looking to deploying their application on Tomcat 11.
12:30
12:30 - 13:00.
by Omotola Omotayo
Track: Community
Room: Melody
Open source has widely grown to allow different tech career paths to enhance projects with their skills and provide jobs for those interested in working with open source. Open source contribution programs provide opportunities for interested individuals to become professionals. Outreachy is a paid and remote internship OS program that empowers, grows talents, and prepares them for career growth. Outreachy provides internships to people subject to systemic bias and impacted by underrepresentation in the technical industry where they are living.
12:30 - 13:00.
by Gabor Kaszab & Zoltan Borok-Nagy
Track: Performance Engineering
Room: Symphony
Apache Impala is a distributed massively parallel query engine designed for high-performance querying of large-scale data. There has been a long list of new features recently around supporting Apache Iceberg tables such as reading, writing, time traveling, and so on. However, in a big data environment it is also a must to be performant. Since Impala has been designed to be fast, it has its own way of reading Iceberg tables.
12:30 - 13:00.
by Gyula Fora & Attila Mészáros
Track: Data Engineering
Room: Rhapsody
12:30 - 13:00.
by Shu Kit Chan
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
The WebAssembly (Wasm) plugin for Apache Traffic Server (ATS) allows WebAssembly modules following the “proxy-wasm” specification to be run on ATS. The talk will begin by first introducing the background and history of plugins and programmability of ATS. I will go over the short comings of the current offerings and then introduce the Wasm plugin as an alternative solution for them. I will then talk about the “proxy-wasm” specification, which describes the support of WebAssembly modules for proxy server software.
13:00
13:00 - 14:00
Lunch
11:00

Melody

Symphony

Rhapsody

Mirror Lounge

14:00
14:00 - 14:30.
by Brian Proffitt
Track: Community
Room: Melody
There are millions of open source projects people can use and contribute to. Why yours? Developing an open source project that is valuable to many and widely accepted in an industry requires a lot of care and feeding – and more than just code. Whether your project is brand new or been around for decades, you need to explain why other people should take the time to learn, use, and potentially contribute to it.
14:00 - 14:30.
by Paul Brebner
Track: Performance Engineering
Room: Symphony
Instaclustr (now part of NetApp) manages 100s of Apache Kafka clusters of many different sizes, for a variety of use cases and customers. For the last 7 years I’ve been focused outwardly on exploring Kafka application development challenges, but recently I decided to look inward and see what I could discover about the performance, scalability and resource characteristics of the Kafka clusters themselves. Using a suite of Performance Engineering techniques, I will reveal some surprising discoveries about cosmic Kafka mysteries in our data centres, related to: cluster sizes and distribution (using Zipf’s Law), horizontal vs.
14:00 - 14:30.
by Oz Katz
Track: Data Engineering
Room: Rhapsody
Apache Arrow has become a de-facto standard for representing large datasets and a very useful tool in any modern data engineering stack. While it allows different technologies to better communicate and share data, the ecosystem around it enables much more! In this talk we’ll cover how Arrow interacts with Apache Spark - from allowing PySpark to interoperate with other Python data libraries, to building data-driven applications with the recent addition of Spark Connect.
14:00 - 14:30.
by Remy Maucherat & Jean-frederic Clere
Track: Tomcat, Httpd and other servers
Room: Mirror Lounge
This session explores the use of the FFM API from Java 22 to leverage native library capabilities, in the context of Apache Tomcat. OpenSSL is here being used to provide support for TLS through the JSSE API, without the need to use the tomcat-native wrapper library. Exploratory design of QUIC and HTTP/3 support from OpenSSL 3.3+ is also discussed.
14:40
14:40 - 15:10.
by Greg Brown
Track: Community
Room: Melody
How do you explain your Apache project to people who don’t even know how to download apps onto their phones – and still manage to get them excited about what you’re working on? It’s simple: pretend you’re talking about a movie. The problem isn’t the project, but how we’ve been talking about them. And now we’re going to fix that. In this talk, discover how to completely change the narrative about discussing Apache and open source by not actually talking about open source or Apache…but instead using the same principles that marketers use to create excitement around a movie.
14:40 - 15:10.
by Gabor Somogyi
Track: Big Data Compute
Room: Symphony
14:40 - 15:10.
by Hongyue Zhang & Luciano Resende
Track: Data Engineering
Room: Rhapsody
Data quality plays a crucial role in data engineering to enable efficient and insightful data pipelines at scale. In this session, we will leverage Apache Iceberg as the scalable table format with ACID guarantee, Apache Toree’s interactive computation capabilities and orchestrate the automated data workflow on Apache Airflow. We will start by talking about how iceberg can use its column level statistics stored in metadata for efficient and reliable data quality validation.
14:40 - 15:10.
by Paul King
Track: Groovy
Room: Mirror Lounge
This talk looks at using Groovy for a well-known data-science problem: classifying Iris flowers. It involves solving this problem using the latest deep-learning neural network technologies and has the option of using GraalVM for blazing speed. Groovy provides a data-science environment with the simplicity of Python but using Java-like syntax and your favourite JVM technologies.
15:20
15:20 - 15:50.
by Edith Puclla
Track: Community
Room: Melody
In this presentation, we will delve into the important role that Apache Airflow plays in the Outreachy program and its broader influence in closing inclusion gaps within the open source community. We will explore the success stories and transformative experiences of Outreachy contributors, emphasizing how this open source project has created opportunities for people from diverse backgrounds. Our discussion will focus on the power of open source initiatives like Apache Airflow to foster a more inclusive and accessible technology ecosystem.
15:20 - 15:50.
by Hichem Kenniche
Track: Big Data Compute
Room: Symphony
As machine learning (ML) models increasingly become integral components of modern applications, there is a growing need to deploy them in real-time environments. Apache Spark is a popular open-source framework for large-scale data processing that supports ML tasks, while Kubernetes provides a powerful platform for container orchestration and deployment. However, combining Spark and Kubernetes poses significant challenges, especially when it comes to achieving low latency and high scalability. In this session, we explore optimal approaches for real-time ML with Apache Spark on Kubernetes, including best practices and strategies for efficient model training, deployment, and serving.
15:20 - 15:50.
by Christina Lin
Track: Data Engineering
Room: Rhapsody
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
15:20 - 16:10.
by Sergio del Amo
Track: Groovy
Room: Mirror Lounge
In this session, Sergio del Amo introduces the Micronaut® framework and demonstrates how the Framework’s unique compile-time approach enables the development of ultra-lightweight Java applications. Compelling aspects of the Micronaut framework include: Develop applications with Java, Kotlin, or Apache Groovy Sub-second startup time Small processes that can run in as little as 10 MB of JVM heap No runtime reflection Dependency injection and AOP Reflection-free serialization A database access toolkit that uses ahead-of-time (AoT) compilation to pre-compute queries for repository interfaces.
16:10
16:10 - 16:40.
by Niklas Merz
Track: Community
Room: Melody
It takes a village to run an open source project successfully. A village is usually run by its citizens and governed by some elected officials. In open source we call the citizens “users” and the people in charge of a project “maintainers”. To understand the health and sustainability of a project we should take a closer look at the community and not necessarily the code in the first place. To understand their demographics a village can run a census.
16:10 - 16:40.
by Martijn Visser
Track: Big Data Compute
Room: Symphony
16:10 - 16:40.
by Muhammet Orazov
Track: Big Data Compute
Room: Rhapsody
16:10 - 16:40.
by Paul King
Track: Groovy
Room: Mirror Lounge
Calling all developers with a penchant for fine whiskey! Join Dr. Paul King, VP at Apache Groovy, on a quest to analyze whiskeys produced by the world’s top 86 distilleries to identify the perfect single-malt Scotch. How will he perform this analysis? By using the traditional and distributed K-means clustering algorithm from various Apache projects. Bottoms up!
16:50
16:50 - 17:20
Lightning Talks