essai
  • Diffusions

  • Bart Lootsma

To kick off 2025, arc en rêve's online publication offers an incisive analysis of the new applications of image-generating AI. Between concern about the disintegration of reality and fear of being submerged in the kitsch of this imagery, between the suspicion of idiocy inherent in the self-generating structure of these machines and the opening up of an as yet unknown future for architectural practice, architectural theorist Bart Lootsma manages to maintain an admirable balance on a subject that easily invites the adoption of clear-cut positions.

 

Susan Sontag, in an Art Nouveau interior generated by an AI.
Susan Sontag, in an Art Nouveau interior generated by an AI.

“The real is produced from miniaturized cells, matrices, and memory banks, models of control – and it can be reproduced an indefinite number of times from these. It no longer needs to be rational, because it no longer measures itself against an ideal or negative instance. It is no longer anything but operational. In fact, it is no longer really the real, because no imaginary envelops it anymore. It is a hyperreal, produced from a radiating synthesis of combinatory models in a hyperspace without atmosphere.” (Jean Baudrillard, 1981) [1]

„The streets no longer lead to fashion’s future; today trends break out on the internet.” (Dean Kissick, 2013)[2]

 

Since last year, with the introduction of Stable Diffusion, Midjourney and Dall.E, several computer programs are available on the internet which, when you enter a text, produce sophisticated images. The images are detailed and show seemingly effortless complex shapes, structures and textures. Working with these programmes is a lot of fun, especially because of the surprising combinations of the known and the unknown the software produces, because you don't know what is really happening, because it is easy delivers results incredibly quickly. The results improve almost every month when measured against their ability to convincingly depict in detail an apparent or possible reality. They will at least produce new aesthetic sensibilities and maybe even more.

Pictorial Turn

Web magazines have surpassed the amounts of subscriptions and page views traditional architectural magazines used to have. ArchDaily boasts 285 million monthly page views, 17,9 million monthly visits, 3,4 million Facebook Fans and 4,2 million Instagram followers.[3] Not even in their heyday did print magazines reach such numbers. The largest architectural magazines had maybe 30 to 60.000 subscriptions, never more, and today there aren’t many left with more than 10.000 subscriptions. The new web magazines have a global reach – both in terms of content and in terms of readers. Whether through photographs or through renderings, images -digital images- are the dominant and most important medium to communicate architecture today already. The sophistication of these images in depicting detail, sharpness, textures, weather, and atmospheres is high and incredibly seductive -whether the project has been realized or not.

In a reaction to the smoothness of most of these images, collage, as a technique to communicate architectural ideas, has celebrated an unexpected comeback. Already in 2013, Pedro Gadanho curated the exhibition Cut ’n’ Paste: From Architectural Assemblage to Collage City in the Museum of Modern Art in New York and ever since there has been a steady flow of exhibitions, publications, and symposia around the theme of collage in architecture. Sam Jacob already spoke of a return with a vengeance.[4]

According to Jacob, the comeback of the collage is a return to drawing after rendering software had taken that out of the hands of the architects. It’s part of a fight against the alienation new technologies causes. “Growing computational power was harnessed to produce rendered images -glossy visions of soon-to-be-built projects, usually blue-skyed, lush-leafed, and populated by groups of groomed and grinning clip-art figures, where buildings appeared with a polished sheen and lens flares proliferated.”[5]

This may very well be true, even if the software Jacob is referring to is also increasingly used to produce staggering dystopian fantasies about, and dark comments on our built-up reality, not least as parts of the special effects industry for moviemaking. This is where this software originally came from before it fell in architect’s hands in the famous Paperless Studio in Columbia University in the early nineteen nineties. These fantasies may very well be considered a contemporary equivalent to drawing and painting. The cinematic dystopian architectural and urban worlds of Liam Young, are a kind of homecoming in this respect.

Also, several Italian architects have recently taken up the technique of collage again to communicate their ideas about architecture. The best-known examples are Carmelo Baglivo, Luca Galofaro and Beniamino Servino, but others, like Davide Trabucco, follow in their footsteps.[6] One of the advantages of collages is that one can introduce (parts of) photographic representations of reality, including styles, materials, and textures. But the biggest advantage is that making a collage goes relatively quickly. Even if Baglivo, Galofaro and Servino also produce books, these collages are posted on social media platforms like Facebook and Instagram in the first place, where they seem to be more at home. They often take the form of memes: combinations of quick remixed graphic ideas with simple texts that work within particular cultural discourses. In the Italian context, one can see different authors communicating with each other through collages. Text plays a minor role here.

Diffusion

It is in this context that Ai programmes appear on the scene that can generate equally sophisticated digital imagery as photographs and renderings and produce images even quicker than one could do with a collage. Software like Stable Diffusion, Midjourney and Dall.E2, does that for you when you just enter a text. The image appears in less than 60 seconds. The images are detailed and show seemingly effortless complex shapes, structures, styles, and textures. Manifestations of nature -skin, hair, and greenery- are effortlessly depicted. This does not mean diffusion software is very well suited to depict an existing reality. When trying to represent for example a portrait of an existing person or city, it only works to a certain extent with examples that are famous in the United States - say, Donald Trump, or Manhattan seen from the Brooklyn Bridge – but they all appear with major faults. Images of less famous people or scenes sometimes do sometimes only have a faint resemblance to the original. Hands and texts are notoriously problematic, but Midjourney in particular is getting better in terms of realism fast with every new version, as they work with the largest datasets.  

Even if we speak of Artificial Intelligence, we should remember that the current text to image models or Diffusion Models are forms of “machine learning". They are trained on extremely large datasets of titled images, but these are not simply used as they are. Noise is added, which basically destroys the original images. After that, they can remove the noise and “recognize” the prompted image from selected material.

If we want a realistic depiction of someone or something, we still better go to Google Image Search. We are far removed here from programs like ChatDPT, which -with some reservations- might come close to delivering results that may compete with Google and Wikipedia and in their ability to build up arguments or have conversations even go beyond their predecessors. 

At the same time, we should not understand these AI programs as forms of intelligence that can generate something really new and unexpected, but only other orders of existing things. Only every now and then something may go wrong and something unknown may emerge by chance. It is then said that the programmes hallucinate.

When working with diffusion models, one tries to understand how one can control the technology after all, but it's not that simple. It's always about the "prompt": the text that sets everything in motion. But they do more than that: prompts also frame the result and can control the content of the result and its aesthetics to a certain degree. "Prompt engineers" are already very experienced at achieving results that come close to the expected result or even beyond that. Websites like PromptHero show examples and offer courses in formulating prompts to achieve ever more perfect results. Whatever perfection may be in this case. If it’s about achieving a detailed, seemingly realistic image, that might soon work out. If it’s about realizing an image that comes close to an image one has in mind from the beginning, it might remain problematic. But one of the more amusing aspects may very well be that the model produces something unexpected.

Learning and unlearning

The learning material defines many of the biases of diffusion models. The datasets are provided by firms like the German non-profit organization LAION and the American Common Crawl. The latter collects 3 billion Internet pages per month. According to The Guardian, “Researchers at LAION took a chunk of the Common Crawl data and pulled out every image with an “alt” tag, a line or so of text meant to be used to describe images on web pages. After some trimming, links to the original images and the text describing them are released in vast collections: LAION-5B, released in March 2022, contains more than five billion text-image pairs. These images are “public” images in the broadest sense: any image ever published on the internet may be gathered up into them, with exactly the kind of strange effects one may expect.” [7]

Still, Midjourney's learning material clearly has its focus on American examples. Then comes Europe, and eventually the rest of the world. It affirms the biases of Western society and has clear racial and gender prejudices. If one wants a female professional or a person of color in an image, one must put that into the prompt explicitly.

On top of that, all representations have crucial flaws. It is not without a reason that the first thing you type in the Midjourney bot is /imagine. It produces an imaginary world, a possible world, a proto-surrealist world, the laws of which are the laws of Alfred Jarry’s ‘pataphysics, a physics of the possible beyond metaphysics. It’s a world without moral impetus – apart from the biases and censorship introduced by its makers. The censorship reflects a morale that is currently dominant in the US: violence is allowed, any word that might vaguely point at love is forbidden, even if names are concerned. Still, it’s an endless source of creativity.   

In fact, the resulting images are mainly communicated on the Internet and function similarly as memes in social networks. Intriguingly, Midjourney is even accessed through trough a social platform, Discord, which was originally developed for online gaming. All images one produces with Midjourney also appear automatically on Discord. They can be downloaded for other purposes from one’s own personal homepage at Midjourney.com. Realizing that they themselves will become part of the datasets, this might produce biases to worry about, like the way opinions on social media can grow into bubbles.

Blind eyes

The new text to image software attracts an immense amount of attention, not just in professional magazines and in universities, but also in the daily press. At the moment of writing this essay, there’s hardly a day without an article on AI in the popular press. There are probably countless other ways in which artificial intelligence is changing and could change our world, some obvious, some more hidden, but it’s the strong visual impact of Stable Diffusion, Midjourney and Dall.E2 and their easy accessibility and use that makes people jump on them. Since Le Corbusier accused architects, they have “eyes that do not see” a hundred years ago, architects do their best to be early adopters of these new technologies. The visual aspect is thereby quintessential. Le Corbusier thought new technologies would mainly change the way buildings would be organized and constructed and therefore the way they looked. It was only much later, that new technologies also forced him to change the organization of his office.[8] Today, it seems the other way around. Developments in computerization in architecture since the nineteen nineties have completely changed every architectural practice, even if it’s not always obvious if this has changed the way architecture looks - apart from those cases in which architects consciously introduced computation already in the earliest part of the design process. Although there are exceptions that are much celebrated, the inherent conservatism of the building industry still slows down the realization of such projects. Also, the design of such projects is still a decent amount of work for skilled architects. The introduction of software that generates sophisticated images of architectural designs from the start, makes architects willing to speculate about the possible impact of AI on their work to keep up with developments.

There are still some problems to be solved to achieve the results architects are waiting for, notably the current impossibility to relate the imagery to plans and sections. Also, it’s not possible yet to insert the AI-generated project in a concrete situation. No doubt there are and will be solutions for that. The fear, that AI will take away work and make many superfluous seems to have vanished.

Text prompts

One of the most intriguing aspects of text to image software is the new relation between the two. The prompt is a shorter or longer text. It’s a command that generates the image, no longer a description of an image that is already there. Similar phenomena play a role in illustrating and in conceptual art. Of course, illustrations to a scientific text or to a manual are supposed to be as precise as possible, but those made for a newspaper article, children’s books, or comics are much more open to the personal interpretation of the artist. This seems one of the most promising fields in which text to image software can find a use. Cartoons and caricatures, which exaggerate a certain situation, are other options.

In art, the title or description is usually added after the visual work has been realized. The idea is, that the visual work speaks for itself -even if that’s not necessarily the case. From the end of the nineteenth century and particularly in the twentieth century, the title and even longer texts relating to the visual work became more important. In conceptual art, the complex relationship between image and text became a recurring issue. This is already evident in the work of Marcel Duchamp, who changed the meaning of everyday objects by putting them in an art context and adding a title, often in the form of a pun. Duchamp’s Green Box from 1934 is already more ambivalent, as it contains notes and sketches related to his magnum opus The Bride Stripped Bare by Her Bachelors, Even, or Large Glass, on which he worked between 1915 and 1923. Some of the notes and sketches in the Green Box anticipate parts of the Large Glass, some are facsimiles, some describe or depict parts of the Large Glass that were never realized, some relate to other works and thus embed the Large Glass in an even larger universe. The combination of the Large Glass (which remained unfinished and was accidentally broken) and the Green Box produces a complex world of ideas, open to different interpretations. But Duchamp also collected his puns as works in themselves, published them, and recorded a spoken version of them, triggering the imagination of the audience in another way. In the nineteen sixties and seventies artists as different as Joseph Kosuth, Robert Barry, Lawrence Weiner, Marcel Broothaers, Sol Lewitt, Joseph Beuys and many others would produce works that would either just consist of texts or texts by means of which someone else could realize a work -maybe even in different contexts.  

In his book The Second Digital Turn, Mario Carpo reminds us that before the renaissance, ‘the main vehicle for recording and transmission of visual data was verbal, not visual: images were described using words; written words were forwarded in space and time, images were not’. And he refers to Isidore of Seville, who epitomized the ancient mistrust of all forms of visual communication, and stated that ‘images are always deceitful, never reliable, and never true to reality’.[9] If it is true that, as Carpo writes, ‘the rapid progress of contemporary digital technologies from verbal to visual to spatial media in the course of the last thirty years curiously reenacts, in a telescoped timeline, the entire development of Western cultural technologies’ many of these issues will probably be solved.[10]

I notice in my own experiments that some people can now be made to believe that the results are photos when I post them on Facebook or Instagram. For example, there’s a series in which I prompted Midjourney to generate young versions of famous architects, with attributes that are semi-related to certain familiar narratives associated with them. Most people, of course, do not know what these people looked like when they were young. Nevertheless, many accept the suggestion. Mostly, one finds only vague hints of the real persons in them, as many or as few as if one had a current or timeless portrait made. The only difference is that people accept it more when they are portrayed 'young' in a photorealistic way, because most people looked different when they were young. The other way around, I notice that people start doubting real photos when I post them after Midjourney images. This is understandable, as many of these images have been photoshopped before they were posted or printed. This anticipates some of Midjourney’s aesthetic biases and has prepared us to accept them. 

Acceptance may have a lot to do with the speed and superficiality of these media, with the textual descriptions, and not least with what people want to see or accept as true. The role of the descriptions is central here: they are not added later, but they are the origins of the images. In this way, they also challenge us to see the images as realisations of these commands. At the same time, Midjourney makes it clear that not everything is to be understood as text and that a linguistic summary of a reality or idea is always a simplification. The images are much richer in information than the prompts.

Guilty Pleasures

The deceitfulness and unreliability of Midjourney images are also inherent to the very essence of diffusion models. They feed on the Internet and feed the Internet themselves in an incestuous process. It’s all Simulacra and Simulations, as Jean Baudrillard would say. He wrote already in 1994 that “today abstraction is no longer that of the map, the double, the mirror, or the concept. Simulation is no longer that of a territory, a referential being, or a substance. It is the generation by models of a real without origin or reality: a hyperreal”.[11] In the case of text to image software there may be millions or even billions of origins to the produced image, but these are al blurred and deconstructed. Baudrillard defines the successive phases of the image as first the reflection of a profound reality; second the masking and denaturization of the image; third the masking and denaturization of a profound reality; fourth the masking of the absence of a profound reality; and finally, the phase in which the image has no relation to any reality whatsoever and becomes its own pure simulacrum. This is obviously the phase we have reached now.[12]

By far the majority of images generated by the new AI programmes belong unmistakably to the categories of fantasy, science fiction and horror, including the scary psychedelic colours that go with them. Areas, in other words, that traditionally already consist of a mixture of exaggerated realism, historical references and complete nonsense. As Roland Barthes already wrote about Martians, the whole psychosis is based on the myth of the identical, the double. [13]This is more than satisfied by Midjourney. Midjourney's strength, its incredible detail and richness of texture, also becomes a weakness here. The images generated, precisely because of the abundance of clichés, details, textures and moods, inevitably become kitsch. And according to Umberto Eco, kitsch is "the ideal food for an indolent public that wants to access and enjoy beauty without having to try too hard."[14]

Does this mean that Midjourney is fundamentally useless? On the contrary. We are only at the beginning, even if, as the name suggests, we are in the middle of the journey. And this journey is as fascinating as it is dangerous. The best thing to do, instead of having Martians designed, is to enter this world as Martians ourselves, like a foreign planet on which we try to get along in all innocence. I suppose many people who enjoy working with Midjourney know it is a hyperreal world of simulacra, and they know that a large part of the production is kitsch. They consider this tongue in cheek as a guilty pleasure, in other words: as a form of Camp.

According to Susan Sontag, Camp is a style that is ironic, theatrical, and exaggerated, characterized by a love of the unnatural, artifice, and the artificial. She argues that camp is a way of seeing things that goes beyond mere style or taste and that it involves a certain degree of aestheticism and frivolity. She also notes that camp is closely related to the concept of "bad taste" and that it often involves an appreciation for things that are traditionally considered low or vulgar. In fact, many of the examples of Camp Sontag gives in her famous essay, could be Midjourney favorites. Under version 3, results often combined a kind of impressionist painting style from around 1900 with a preference for Art Nouveau-like forms. Sontag calls Art Nouveau as the most typical and fully developed Camp style. “Art Nouveau objects, typically, convert one thing into something else: the lighting fixtures in the form of flowering plants, the living room which is really a grotto. A remarkable example: the Paris Metro entrances designed by Hector Guimard in the late 1890s in the shape of cast-iron orchid stalks.”[15] Sontag argues that camp is often most effective when it appropriates elements of low culture, transforming them into something that is both ridiculous and sublime.In most cases, this is exactly what the Diffusion Models do. Sontag sees camp also as a mode of cultural production that is both celebratory and critical, a way of embracing and reveling in the absurdity and excess of modern life while simultaneously exposing the artifice and artificiality that underlie it.

The enormous impact of text to image models will probably change aesthetic sensibilities in architecture and design. And maybe someday we can design with this confusing new AI infused software and thus project it back into reality. After all, in the early 1990s, when special effects software like Maya only ran on extremely expensive Silicon Graphics machines, what is now normal in everyday use could not be expected immediately either. And this development is accelerating. „The streets no longer lead to fashion’s future; today trends break out on the internet,” wrote Dean Kissick of the fashion magazine i-D. The same will be true for architecture, design and probably the whole of visual culture.[16]

 

 

 

 

This text was originally written for the Book Diffusions in Architecture, edited by Matias del Campo and published by Wiley in 2024. Slight updates were made for publication in Daidalos 23/24, 2024

 

 

 

 



[1] Baudrillard, Jean. Simulacra and Simulation, The University of Michigan Press, Ann Arbor, 1994. p2.  

[2] Kissick, Dean. Didn’t I see you on the cover of i-D?, i-D 326, Pre-Fall 2013, The Street Issue. 

[3] https://www.archdaily.com/content/about?ad_source=jv-header&ad_name=hamburger_menu, Accessed 20230317

[4] Jacob, Sam. Architecture Enters the Age of Post-Digital Drawing, Metropolis, http://www.metropolismag.com/architecture/architecture-enters-age-post-digital-drawing/, accessed 20170716

[5] See note 4.

[6] Tommaso Ferrando, Davide; Lootsma, Bart; Trakulyingcharoen, Kanokwan. Italian Collage, Lettera Ventidue, Siracusa, 2020.

[7] Bridle, James. The stupidity of AI, The Guardian, 20230316, https://www.theguardian.com/technology/2023/mar/16/the-stupidity-of-ai-artificial-intelligence-dall-e-chatgpt?fbclid=IwAR3uIea7PVtxFIwqU-bTK8guAWrVJTpD6WbiByn8qo5wy_KK14k88Hnn1Ns, accessed 20230317

[8] Michels, Karen. Der Sinn der Unordnung. Arbeitsformen im Atelier Le Corbusier, Vieweg, Braunschweig/Wiesbaden, 1989.

[9] Carpo, Mario. The Second Diigital Turn, Design Beyond Intelligence, MIT Press, Cambridge (Mass.)/London, 2017. Pp. 102-103.

[10] Idem.

[11] Baudrillard, Jean. Simulacra and Simulation, The University of Michigan Press, Ann Arbor, 1994. P.1. 

[12] Baudrillard, Jean. Idem. P.6.

[13] Barthes, Roland. Marsmannetjes, Mythologieën, UItgeverij IJzer, Utrecht, 2002. Pp. 43-45.

[14] Eco, Umberto. Die Struktur des schlechten Geschmacks, in: Im Labyrinth der Vernunft, Texte über Kunst und Zeichen, Reklam, Leipzig, 1990. P. 246.

[15] Sontag, Susan. Notes on Camp, in: Against Interpretation and other essays, Farrar, Straus & Giroux, New York, 1966. P. 279.

[16] Kissick, Dean. See note 2. 

Bart Lootsma

Bart Lootsma is a historian, theorist, critic and curator in the fields of architecture, design and the visual arts. He has published Media and Architecture (1997), SuperDutch (2000), Reality Bytes, Selected Essays 1995-2015 (2015) andItalian Collage (2020). He curated the ArchiLab 2004 exhibition in Orléans, the Montenegrin pavilion at the 2016 Venice Biennale and Radical Austria, Everything is Architecture. For arc en rêve, he contributed to the exhibitions Insiders in 2010 and New Forms of Collective Housing in Europe, in 2008.