Cradling Specters
Hedda_Roman & Marina Bochert
May 26–June 25, 2023
Eisenstein’s idea of plasmaticness originates from his study of animation and art, where he notes the fluid, metamorphic potential of drawn figures, their ability to adopt any form while retaining the same essence.
This, in a way, parallels the process of diffusion models which too possess a dynamic and transformative capacity. In a clip guided diffusion model, using img2img, a base image or input is continually and progressively modified through a denoising mechanism until it reaches its desired form, while retaining the inherent information of the original input.
Pathosformel, on the other hand, is Aby Warburg’s concept denoting emotional formulas or symbolic imagery that recur throughout the history of art, transmitting affective responses across time and cultures.
Aby Warburg’s Mnemosyne Atlas was a monumental attempt to map the pathosformel, or the recurring motifs in human culture that evoke emotion. He sought to unravel the threads of memory that have shaped art and culture over centuries, demonstrating how certain themes and emotions persist through time.
In a similar vein, we could view diffusion models as a kind of hyper, perverted Mnemosyne. These models have been trained on vast amounts of data – five billion image and text pairs, which collectively represents an enormous swath of human expression, thought, and creativity. This data is a kind of digital subconscious and unconscious of humanity, representing not only what we consciously choose to create and share, but also the underlying patterns, themes, and motifs that we may not be consciously aware of.
Like Warburg’s Mnemosyne Atlas, diffusion models map out these patterns, but at a scale and level of detail that Warburg could not have dreamed of. The models can identify the most minute and nuanced pathosformel in the data, even ones that would be too subtle or complex for a human to spot. And they can do this not just for a single culture or time period, but for all the data they’ve been trained on, crossing cultural and temporal boundaries.
However, this is where the „perverted“ aspect comes in. Unlike Warburg, who had a deep, intuitive understanding of the themes he was working with, diffusion models have no understanding or awareness of the patterns they identify. They’re blindly following the mathematical algorithms they’ve been programmed with, without any comprehension of the significance or emotional resonance of the pathosformel they uncover. They’re like savants, capable of remarkable feats but without any real insight into what they’re doing. Like an alien life form that is born out of human expressions without the possession of a body or mind.
Just as animation can take any shape or form, so too can these models generate an almost infinite array of outputs, limited only by the patterns they’ve learned from their training data. This isn’t a physical transformation, of course, but a virtual one, happening in the high-dimensional space of the model’s weights and activations.
This is where the concept of plasmaticness intersects with the Mnemosyne-like function of the models. Just as Warburg’s Mnemosyne Atlas traced the transformation and survival of antiquity’s motifs across time and culture, these AI models trace the transformation of the collective subconscious and unconscious as represented in their training data, and they do so in a way that is inherently plasmatic, ever-changing, and boundlessly creative.
Combining these two ideas, we arrive at the concept of Pathosplasma. In the realm of diffusion models, Pathosplasma would involve the generation of images or outputs that not only transform fluidly, but also carry a deeply ingrained, affectively charged essence.
This thesis also challenges our understanding of the agency in art. Traditional art is fully a human endeavour, the work of the human hand, guided by the human mind and spirit. But in the world of Pathosplasma, the creation process is more ambiguous. The outputs are derived from the mathematical algorithms within the models, algorithms that have no consciousness, no understanding, and no creativity in the human sense. Yet, they are able to generate works that are strikingly beautiful, emotionally resonant, and deeply symbolic, showcasing the strange paradox at the heart of AI-generated art.
Artists are able to guide and fine-tune these alien “intellects”, shaping the underlying ‚pathos plasma‘ or ‚DNA‘ that influences the direction and scope of AI-driven creation. The artist’s role morphs from the traditional craftsman to a genetic engineer of aesthetics, delicately influencing the algorithms and setting parameters within which this artificial leviathan can express itself.
1. Wet Closet
Hedda Roman work with new, sophisticated image-generating tools and AI technology to make art installations. They created a new set of images and videos for this exhibition.
The artists prefer to refer to such tools as “Alien Artifact” or “Savant” instead of using the term “Artificial Intelligence.” These models are based on a learned representation of image-text relationships, rather than a true understanding of the underlying concepts of words and images. These artificial processes are intertwined with and dependent on the artists’ input data, ideas, and vision. The artists have re-trained them, and fine-tuned the models using their own custom image collection.
In other words, their artwork is an intimate and conscious collaboration with AI technology and not simply enabling this technology to create images.
The artists have thus produced and printed images that are partly manifested by this technology. These images will be presented next to monitors, which display constantly shifting images based on a certain archetype of image to highlight the process or “Genese” of the AI Alien Artefact Dreamset.
Hedda Roman view these tools as an expression of the present moment, as well as a new material to experiment with. This material is made of emotional connections of the human mind with statistical, nonhuman collections of complex image and language data sourced from the internet. This new form of visualization, aided by these alien tools, adds an unprecedented level of image and information production to a world that is already saturated with visual stimuli.
The artists reach out to an “alien hand” to enhance what is initiated by the human mind through the algorithmic mimesis of such a “mind.” The challenge is to create something that confronts the noise of the world’s endless information, without succumbing to fatalism. The human mind is still the initiator and the humanoid mimesis of the mind is the material. The artists have also previously created a fictional persona and online avatar, “Oldboy.” The audience will hear this figure speak in strange poems once in a while throughout the exhibition.
Another video installation “Light-field Memories: Torso,” is a 3D video inspired by Rilke‘s poem “Archaischer Torso Apollos.” This piece is a culmination of the flipping coin theme that pervades the video installation and printed images, reflecting our probabilistic era where AI oracles play a role and raise questions about the problem of self-fulfilling prophecy. The piece is a unique visual experience, exclusively created for
viewing on a light-field monitor. The monitor emits pixels with directional information, allowing the images and animations to be viewed in 3D without the need for special glasses. The looped sequence invites the viewer to explore the image from different angles, providing an immersive and interactive experience.
Sky breaking, clouds falling, Hesitant rain.
The musty smell of wet clothes Permeates the air.
Mold begins to form, The color is night.
Pathos and Dataplasma intertwine, Laplace’s fiend lurks around.
Draconian claws crackling on foil.
Duotone moonlight smoothing A pale reflection of his grin.
His twisted eyes are flaunting tears Of secrets kept and spoken wis.
Shades unfold,
His being spills over his will. Determinism stands in the rain.
As the light expands through the slit, The closet creeks.
Next morn, a desert:
The hanging cloth dries In the solar gaze.
The closet is closed from within. We laugh together,
Two flies on a score.
1. Explanation of image genesis on a technical level
The work begins in the case of „Gradient Descent“ with the digital reproduction of a drawing created by Hedda. This is achieved through high-resolution photography, which is then subjected to an upscaling algorithm and now serves as the starting basis for the application of the machine learning process of the diffusion model.
A diffusion model, generates images by iteratively denoising random noise. This process was trained and refined using pairs of images and labels from the LAION-5B dataset. This process can also be further fine-tuned by training the model on custom datasets, thus personalizing the output of the model.
Fine-tuning diffusion models with new text and image pair datasets which are costume captioned, one can also create novel aesthetic concepts. By leveraging gradient descent, new underlying concepts can be discovered, indicated by rapid drops in the loss function during training.
In the context of overfitting the model, meaning training it for too long, we do this intentionally to strongly influence the model towards a picture idea and so to speak to „corrupt“ it, to force it to generate images within a very specific aesthetic range.
Although this could limit the versatility of the model, it may also enable a unique artistic signature. Yet, it’s a delicate balance we strive to strike, avoiding overtraining to the point where the model’s „plasmaticness“ solidifies, thus ensuring it retains its dynamism and adaptability.
Training diffusion models with the outputs of older diffusion models is another fascinating approach that allows the new model to „learn“ and capture the styles present in the outputs of the older models. This method could facilitate the reclamation of certain aesthetics or qualities that newer models may not naturally possess. Specifically for artistic applications, we have selected a library of diffusion models trained by us for Gradient Descent, and some were also trained anew. The series of small-format images, for example, originated from one of these „archaic“ forms of diffusion models and was determined as a training dataset.
The concept of „img2img“ in CLIP-guided diffusion models introduces another factor into the image generation process. Instead of starting the model with pure noise, it begins with an existing input image. This input image could be a photograph, a drawing, a 3D render, or another visual artifact. At each step of the diffusion process, this input image is modified along with the noise, significantly influencing the final image. In this case, as explained at the beginning, it is an immensely enlarged photographic reproduction of a drawing by Hedda. This strong enlargement seems to
make the figurative part disappear, but the graduation from „light to dark“ remains. In fact, it represents a section from the drawing of ‚Almonds End‘.
In this way, the diffusion model is not only controlled by the text encoder but also by the artistic preparatory work that has gone into the input image.
In addition, it should be mentioned that the diffusion models used all work locally on the artists‘ workstations. No cloud-based algorithms like “Dalle” or Mid Journey are used. Only local use on one’s hardware and custom scripts enables this form of granular control.
For the creation of text embeddings, on the other hand, we have developed parametric prompts, that is, the interplay of random prompt modules, which always form a unique combination. These prompts are conceived through a collaborative process that also involves large language models. In the case of “Gradient Descent”, it also includes an extensive list of exotic animal species that partly merge with each other, as well as cosmic cataclysmic events. Before the final list of modules is found, numerous experimentation processes are needed to discuss the interaction of the prompt modules and find a combination that creates a coherent continuity.
With each unique prompt, a series of iterations are performed that gradually increase the depth and complexity of the generated visual data. The img2img script is then used, coupled with the upscaled image, to apply these parametric prompts, thus creating a diverse portfolio of images, each representing a unique interpretive transformation of the original artwork.
This investigation further includes a series of iterative attempts, with each “run” of the script encompassing a unique combination of prompts and settings. This systematic procedure results in the production of 30 iterations of high-resolution images, each successive layer contributing to the overall artistic complexity of the piece.
A crucial component of this methodology is the integration of Control Nets, including depth and normal pass sensing. These Control Nets interpret depth within the 2D digital representation of the original drawing, leading to the creation of interesting overlaps and a perception of depth in the final image. This has now been used for the first time in “Gradient Descent”. In the images from Salzburg in the „Wet Closet“ exhibition, it was so far only a simple Depth Model.
The image resolution is also varied in some of the “runs”, which allows image content clusters of different sizes to be formed. This process also uses tile-based Control Nets or specialized upscale Generative Adversarial Networks (GANs) to edit individual image sections.
Upon completion of the automated components of this process, manual intervention is applied. The thirty ( each of them 270-megapixel ) layers generated by the previous process are compiled into a single Photoshop file. These layers are then manually interwoven, resulting in a comprehensive digital artwork that embodies both machine-generated and „handcrafted elements“.