Chief Architect | Advanced Computing | HiSilicon Turing Department, Huawei Technologies, Germany / January 2021 — Present 

    • Improve the Arm and AI European ecosystem (weather simulation, life sciences and computer-aided engineering) and contribute to next-generation SoC and low-level software technologies
    • Combine HPC and AI technologies seamlessly into one common platform
    • Research and development of graph neural network (GNN) models for Huawei’s artificial intelligence portfolio
    • Analyse hardware- and software architectural optimisation for sparse linear algebra
    • Development of SVE-enabled libraries and typical European HPC application benchmarks
    • Represent Huawei in standardisation and industry activities such as EU-level strategy shaping activities like the European Technology Platform for HPC (ETP4HPC)
    • Drive technical academia engagement programs and cooperation projects
    • Execution of additional hirings for HiSilicon team for advanced computing – this includes screening candidate profiles, supporting interviews, refinement of job descriptions and coordination with recruiters

Chief Architect | HiSilicon Turing Department, Huawei Technologies, Germany / January 2019 — Present 

    • Enabling Arm for HPC / Arm+AI ecosystem with cooperation partners to foster the use of HiSilicon solutions, gathering requirements for next generation server systems
    • Support HQ HiSilicon R&D team in designing efficient SoC architectures for advanced computing and autonomous driving
    • Represent Huawei in standardisation and industry activities such as EU-level strategy shaping activities like the European Technology Platform for HPC (ETP4HPC)
    • Drive technical academia engagement programs and cooperation projects
    • Hiring for HiSilicon team for advanced computing – this includes screening candidate profiles, supporting interviews, refinement of job descriptions and coordination with recruiters
During my time in this position, I have successfully matched candidates against job roles and classified suitability. I have also orchestrated the deployment of Huawei’s server and AI hardware with many partners, who are active in computer vision, machine learning, scientific computing or supercomputing.

Chief Architect | Central Hardware Department, Huawei Technologies, Germany / January 2015 — December 2019 

    • Led and hired a team of >7 experts with a focus on computation (micro-architecture), interconnect and memory technologies
    • Successfully orchestrated and organised a project charter for an Arm-based advanced computing prototype, including the definition of project scope, key technologies and milestones
    • Defined architectural elements and improvements for Arm-based computing systems in the areas of computation, memory system, storage, switching and interconnect
    • Identified and researched on new key technologies contributed to the Huawei’s system strategies
    • Facilitated the architectural concept for in-memory processing technologies for the use in Arm-based systems
    • Collaborated with three significant research and data centres in Europe to evaluate Arm-based systems and make experimental hardware codesign
    • Established partnerships for Horizon 2020 projects on advanced computing
    • Principal representative to the Cache Coherent Interconnect for Accelerators (CCIX) standard, implemented in HiSilicon Kunpeng 920 SoC 
    • Defined high-impact libraries and performance tools for the Arm architecture, generated requirements and executed performance optimisation
    • Contributed to the incubation of a team which is focusing on autonomous driving

Senior Engineer | Systems Optimisations Competency Center, IBM Research & Development, Germany  / January 2014 — December 2014

    • Led the technical and performance team for the SAP HANA in-memory, column-oriented, relational database management system on the IBM POWER architecture
    • Ensured that the performance result is on par with competitive hardware configurations
    • Provided support to achieve performance objective by leading POWER specific code development and executing performance evaluation
    • Analysed utilisation of hardware resources (memory bandwidth, threads, cores and sockets) during intensive scaling tests
    • Defined the scale-out and scale-up system architecture and execute corresponding performance measurements
    • Investigated hot functions on micro-architecture level
    • Improved vector code coverage in SAP HANA by 6%, leading to 15% more performance

Senior Engineer | Blue Gene Active Storage, IBM Research & Development, Germany  / July 2011 — December 2013

    • Led the technical engineering team of >8 people, which was responsible for the development and delivery of the Blue Gene Active Storage (BGAS) architecture
    • Successfully contributed to the BGAS architecture to achieve a balanced integration of solid-state storage, computation, and cost-scaleable network
    • Leveraged Blue Gene/Q as a vehicle for rapid prototyping for active storage concepts
    • Executed proofs-of-concept on computing-in-storage for applications in neuroscience and middleware software packages including GPFS and DB2
    • Responsible for the architecture and development of software packages (including peripheral image, device driver, FPGA image and middleware frameworks) for a hybrid scalable solid-state storage device, which targeted research explorations
    • Decomposed acceleration function for industrial and scientific application scenarios
    • Responsible for the development of a software-based RDMA network interface controller
    • Coordinated research engagements with three customers in Germany, Switzerland and the United Kingdom
    • Mentored bachelor and master students

Senior Engineer | Blue Gene/Q, IBM Research & Development, Germany  / April 2009 — June 2011

    • Led a global PCI Express verification and performance team for the Blue Gene/Q ASIC to complete ahead of schedule and within the expected performance promises
    • Led and executed the hardware bring-up of the PCI Express core of the Blue Gene/Q ASIC
    • Create the verification and performance plan and regularly reported status to executive management and customers
    • Developed proxy applications to simulate parallel file-system traffic tunnelled via InfiniBand over PCI Express
    • Created a hardware simulation environment to imitate the operation of the Blue Gene/Q ASIC in combination with PCI Express attached devices (including physical, link and transport layer)
    • Implemented an automatic regression framework for I/O traffic on ASICs

Resident Enginner | Open Systems Development, IBM Research & Development, Germany  /  January 2008 — March 2009

    • Led the bring-up of the QPACE project and coordinated >15 developers from industrial and academic partners
    • Responsible for the architecture and the development of the firmware for the compute node of the QPACE project
    • Developed the bring-up plan for the QPACE project

Resident Engineer | Open Systems Development, IBM Research & Development, Germany  / September 2006 — December 2007 

    • Contributed to the firmware development of blade servers using the PowerPC 970 and Cell/B.E. processor
    • Responsible for the PCI Express device discovery algorithm
    • Ensured the hardware bring-up, compatibility and performance of PCI Express-based InfiniBand adapters
    • Accountable for the PCI Express compliance testing with a focus on the physical layer, configuration space, link \& transport layer and platform configuration
    • Led the AbiCell and NICOLL project

Staff Engineer | I/O Firmwware Development, IBM Research & Development, Germany  / September 2004 — August 2006 

    • Developed the Linux kernel framework and interrupt processing routines for the IBM System p InfiniBand and Ethernet device driver
    • Ensured compatibility and performance towards upper-level protocols such as the Message Passing
      Interface (MPI) standard, Socket Direct Protocol (SDP) and communications using TCP/IP over InfiniBand
    • Coordinated the open-source and release process with the Linux kernel development community and
      Linux distributors
    • Led the technical bring-up and development of a parallel cluster based on the PowerPC 970 processors and ultra-low latency InfiniBand network components
    • Contributed to software optimisations and performance tests to make QPACE to the most energyefficient supercomputers of June 2010