¿Cómo leer un valor de cadena en la estructura de matriz JSON?

Este es mi código:

df_05_body = spark.sql("""
     select 
    gtin
    , principalBody.constituents
 from 
v_df_04""")

df_05_body.createOrReplaceTempView("v_df_05_body")

df_05_body.printSchema()

Este es el esquema:

root
 |-- gtin: array (nullable = true)
 |    |-- element: string (containsNull = true)
 |-- constituents: array (nullable = true)
 |    |-- element: array (containsNull = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- constituentCategory: struct (nullable = true)
 |    |    |    |    |-- value: string (nullable = true)
 |    |    |    |    |-- valueRange: string (nullable = true)

¿Cómo cambiar la principalBody.constituentsfila en el SQL para leer los campos constituentCategory.valuey constituentCategory.valueRange?

Answer

La columna constituentses una matriz de matrices de estructuras. Si su intención es obtener una estructura plana, deberá aplanar las matrices anidadas y luego explotar:

df_05_body = spark.sql("""
    WITH
      v_df_04_exploded AS (
      SELECT
        gtin,
        explode(flatten(principalBody.constituents)) AS constituent
      FROM
        v_df_04 )

    SELECT
      gtin,
      constituent.constituentCategory.value,
      constituent.constituentCategory.valueRange
    FROM
      v_df_04_exploded
""")

O simplemente usando inlineafter flattenasí:

df_05_body = spark.sql("""
    SELECT
      gtin,
      inline(flatten(principalBody.constituents))
    FROM
      v_df_04_exploded
""")