Cheese Join Example
====================================

Let's suppose we had two datasets as follows:

.. table:: ``main_df``
    :class: longtable

    =============== ======== ================
    Name            Group ID Favourite Food
    =============== ======== ================
    \               2        Egg + Spam
    John Cleese              Spam
    Eric Idle       3        Spam
    \                        Egg Bacon + Spam
    Terry Jones              Egg
    Terry Gilliam   2        Spam
    Carol Cleveland          Egg
    Neil Innes      1        Spam
    =============== ======== ================


.. table:: ``cheese_df``
    :class: longtable

    =================== ======== ================ ===================
    Name                Group ID Favourite Food   Cheese Choice
    =================== ======== ================ ===================
    Graham Chapman               Egg + Spam       Red Leicester
    John Marwood Cleese 3        Spam             Tilsit
    Eric Idle           3        Eggs             Bel Paese
    Michael Palin                Egg Bacon + Spam Red Windsor
    Terry Jones         2        Egg              Stitlton
    Winnie the Pooh     1        Honey            Gruyère
    Terrence Gilliam    2        Spam             Emmental
    Carol Cleveland     1        Egg              Norwegian Jarlsberg
    Neil Innes          1        Egg              Liptauer
    =================== ======== ================ ===================

We might want to join these two datasets together, but we have a problem: the names
don't match exactly. In this case, we could write some logic to automatically clean the
names and then join the datasets together. However, for a more complicated example, this
might not be possible so let's pretend we can't do that. In addition, some of the names
are missing entirely.

We might come up with the following strategy to join the datasets together:

.. image:: ../_static/examples/cheese_join/flowchart_dark.png
    :class: only-dark
    :align: center

.. image:: ../_static/examples/cheese_join/flowchart_light.png
    :class: only-light
    :align: center

This would be the expected output for the join:

.. table:: ``join_df``
    :class: longtable

    ======== ================ =================== =================== =================
    Group ID Favourite Food   Name                Cheese Choice       Join Type
    ======== ================ =================== =================== =================
    2        Egg              Graham Chapman      Red Leicester       Food
    3        Spam             John Marwood Cleese Tilsit              Food and Group
    3        Spam             Eric Idle           Bel Paese           Name
    \        Egg Bacon + Spam Michael Palin       Red Windsor         Food
    2        Egg              Terry Jones         Stitlton            Name
    2        Spam             Terry Gilliam       Emmental            Name
    1        Egg              Carol Cleveland     Norwegian Jarlsberg Name
    1        Spam             Neil Innes          Liptauer            Name
    ======== ================ =================== =================== =================

And here's how you'd set that up using a ``joinplex``-style interface:

.. code-block:: python

   name_join = ColJoin(main_df, cheese_df, on='Name')
   food_join = ColJoin(main_df, cheese_df, on='Favourite Food')
   group_id_join = ColJoin(main_df, cheese_df, on='Group ID')

   # A plex is a general object representing a table operation - e.g. joining or filtering.
   plex = OneManyNoneFunnel(
       name_join,
       one="Name",
       many=ValueError,
       none=OneManyNoneFunnel(
           food_join,
           one="Food",
           none=ValueError,
           many=OneElseFunnel(
               group_id_join,
               one="Food and Group",
               many_or_none=ValueError,
           )
       )
   )

   join_df = plex.run(
       label_col = 'Join Type',
       right_cols = ['Name'],
       combined_cols = ['Group ID']
   )

Doing the same thing directly in ``pandas`` would be very tedious, and difficult to
understand at a glance.

This example has been fairly straightforward. Joinplex should be able to
handle more complicated examples, such as:

* Many-to-many, one-to-many and many-to-one joins
* Joins with three tables or more involved
* Funnelling based on the number of matches on both sides of the join (e.g. one-to-one
  distinguished from many-to-one)
* Other join types than ``ColJoin``-based on a functional comparisons, spatial joins
  using ``geopandas``, etc. - More complex funnelling configurations involving cycles,
  multiple uses of the same join, etc., yet being economical with computation.
* Precedence: require that one join outcome is reached before another join is completed,
  and then restrict the possible join candidates only to those which are not already
  joined. This is useful in situations where you can use “process of elimination” style
  logic.