Listening as Ethics: Divergent Ontologies of Sound in Japanese Ma, Cage, and Schopenhauer

Donate to CA

The free access to this article was made possible by support from readers like you. Please consider donating any amount to help defray the cost of our operation.

 

Listening as Ethics: Divergent Ontologies of Sound in Japanese Ma, Cage, and Schopenhauer

Takahide Tanaka

 

Abstract
Listening is an ethical mode of being in the world, and the ontology of sound perception diverges fundamentally across cultures. Western musical cognition is rooted in acoustic perspective (AP), a hierarchical spatial schema of pitch that aligns with Schopenhauer’s view of music as the structured expression of the will. John Cage, informed by Zen philosophy, sought to liberate sound from intention, urging the listener to hear sonic events as pure presence. Japanese auditory culture represents a third position: sound as voice―a relational emergence between self and environment. This relational ontology is expressed through Ma (間), the generative interval in which meaning arises. Neuroscientific studies of insect-sound processing and linguistic evidence from the kana system corroborate this orientation. Yet contemporary performance practice in Japan often prioritizes centrifugal projection, reducing acoustic depth and complicating engagement with Western polyphony. This paper argues that Japanese listening requires a dual auditory literacy that honors relation while engaging structural depth.

Key Words
acoustic perspective; John Cage; cross-cultural auditory cognition; ethics of sound; Japanese listening; Ma; philosophy of music; Schopenhauer

 

1. Introduction

The act of listening is not merely a sensory event but a culturally and philosophically conditioned mode of being in the world. Different cultures cultivate distinct expectations of what sound is and what listening does. In Western art music, pitch relations are often understood as constituting a form of spatial depth, an acoustic perspective (AP) in which lower pitches function as foundational and higher pitches as derivative. This model resonates with Schopenhauer’s conception of music as the hierarchical manifestation of the will.

This paper approaches listening as a fundamental ontological and cognitive act, while treating performance as the embodied manifestation of that listening. How one listens shapes how one performs: auditory ontology becomes somatic practice. The cross-cultural frictions discussed in what follows therefore concern not simply technical execution but divergent listening premises that are enacted in musical practice.

Japanese auditory culture provides a particularly revealing site for examining these frictions. Interpretations of Western classical music in Japan are sometimes described as lacking structural depth or exhibiting a form of perceptual flatness. Rather than treating this phenomenon as a technical deficiency, this paper reframes it as an instance of ontological divergence among different listening orientations.

The discussion focuses on three paradigms: Western structural listening grounded in AP; John Milton Cage Jr; Zen-inspired, nonintentional listening; and a relational listening articulated through the Japanese concept of Ma (間). The aim is not to rank these orientations but to clarify how each presupposes a different ontology of sound and a different understanding of the listener’s role in the emergence of meaning.

This study argues that Japanese relational listening constitutes a third ontological mode distinct from both Western structuralism and Cage’s erasure of meaning. Through comparative philosophical analysis, supplemented by neuroscientific findings and linguistic evidence, the paper examines how the listening body participates in the co-creation of world and meaning. Although the discussion is grounded in Japanese auditory culture, the question it raises is general: what happens when musical works are performed in contexts where the listening premises presupposed by their composers are no longer shared?

2. Schopenhauer: the structured world of acoustic depth

Schopenhauer is selected here not as a historical relic but because his metaphysical account of music as the “objectification of the Will”[1] provides the most fundamental and enduring articulation of the hierarchical, structural logic that continues to underpin Western AP. Even in contemporary performance practice, this nineteenth-century structuralist paradigm remains the implicit operating system against which other ontologies must be measured. Schopenhauer famously regarded music as the direct objectification of “the Will,” a metaphysical force that underlies all phenomena.[2] In his schema, the hierarchical organization of sound mirrors the hierarchical structure of reality itself: bass tones correspond to the most fundamental layers of existence, harmonic structures articulate the ordering of intermediary forms, and melody expresses the individuality of phenomenal experience. Sound, therefore, never exists in isolation but acquires its ontological status through its place within a predefined structural hierarchy―a hierarchy that precedes and governs the very possibilities of musical meaning.

Crucially, this metaphysical framework is not only theoretical but embodied through musical performance. In Western vocal and instrumental practices, internal resonance is cultivated before projection. Singers expand the pharyngeal and thoracic cavities; instrumentalists optimize the inner airflow and resonance of their instruments. Low frequencies are experienced as physically grounded and close to the body’s core, while high frequencies feel outward and distant. Through such centripetal bodily engagement, performers construct an auditory depth from within, making hierarchy not merely a perceived structure but a lived spatial form.

This embodied spatialization aligns with the listener’s cognition. The perceptual schema that treats low pitches as proximal and high pitches as distal reconstructs an AP analogous to linear perspective in visual art.[3] Listening becomes an act of world-structuring: the listener re-establishes a stratified depth in which sounds derive identity from their relative positions.

For Schopenhauer, music’s ethical and metaphysical value lies precisely in this structured relation. To listen is to encounter the will in a mediated, ordered form, recognizing the world’s hidden hierarchy made audible. Within this framework, Western musical aesthetics presuppose that depth is inherently bound up with meaning, that hierarchy functions as the primary principle of order, and that structure constitutes the very mode in which the world exists. Sound becomes meaningful insofar as it participates in a pre-given architecture of relations; listening is therefore a commitment not only to sound itself but to the order that stands behind sound.

3. John Cage: the erasure of meaning and the presence of sound

John Cage, by contrast, rejects hierarchical structuring altogether. Influenced by Zen philosophy,[4] he argued that sound should be released from intentional control and symbolic function. For Cage, the task of listening is to encounter each sonic event as pure presence, prior to categorization or conceptualization. Listening, in this sense, aims at the suspension of meaning rather than its construction. His famous declaration—“I have nothing to say and I am saying it”[5]—encapsulates a radically different orientation: music should not represent the world nor organize it. Instead, it should allow the world to sound.

Cage’s practice is grounded in his interpretation of Zen philosophy,[6] which teaches that phenomena arise and vanish without inherent meaning. From this perspective, musical order is not discovered but imposed by the human will; hierarchy itself becomes a form of violence against sound’s natural state. In rejecting the metaphysical role of structure, Cage also rejects Schopenhauer’s premise that meaning originates from relational position within a hierarchy. This philosophical stance is manifested in performance techniques such as nonintentional operations, for example, the I Ching, the incorporation of everyday objects and environmental noise, and works such as *4’33”*,[7] which remove traditional performance elements in order to expose listening itself. By eliminating pitch relations, harmonic expectations, and narrative direction, Cage effectively erases acoustic depth: no sound stands before or behind another and all sounds are equally present. Meaning is not discovered in distance or relation but in the immediacy of the sonic event.

It is crucial, however, to distinguish Cage’s Zen-inspired listening from the mode of listening characteristic of Japanese auditory culture. Although both are sometimes associated with Zen discourse, Cage’s approach seeks the suspension of intention and the reduction of meaning, allowing sounds to occur as neutral events rather than as carriers of expression or relation.[8] Japanese listening, by contrast, does not negate meaning or intentionality. Instead, it treats sound as relational and often as voice, emerging within a shared field of self, environment, and history—a dynamic articulated through the concept of Ma.[9] Whereas Cage’s listening aims to hear the world without attachment or symbolic overlay, Japanese auditory practice tends toward the fullness of relation, allowing meaning to arise through spatial and temporal interdependence. The decisive difference, therefore, lies not in the presence or absence of Zen influence but in the status of meaning itself: Cage suspends meaning in order to let sound occur, while Japanese listening attends to the space in which meaning emerges.

4. Japanese listening and the ethics of ma

However, it is important not to romanticize this third position of Ma as a form of innate cultural superiority. Within contemporary Japanese musical practice, this relational orientation can sometimes function not as reflective attunement but as an unexamined default. In such cases, performers may fail to engage rigorously with the structural demands of Western polyphony (AP) or with the radical nonintentional discipline of Cage, and instead rely upon a familiar sense of “relationality.” This is not simply a matter of aesthetic preference; it risks overlooking the alterity of the musical work by assimilating it into a culturally habitual mode of listening. The call for dual literacy proposed in this paper is therefore not a celebration of Japanese uniqueness but an ethical corrective against this tendency toward unreflective assimilation. Throughout this paper, a distinction is maintained between performer-centered listening, which foregrounds individual expressive intention, and work-centered listening, which treats the sonic event as a relational field in which meaning emerges.

Listening, as discussed here, is not ethical in the sense of moral judgment or normative prescription. Rather, it is ethical insofar as it determines how one positions oneself in relation to the emergence of meaning. In this context, listening should not be understood as a purely internal or contemplative activity. Rather, it functions as the mediating condition through which compositional assumptions are translated into concrete performance decisions. How a performer listens determines how pitch relations are spatialized, how vocality is conceptualized, and how sound is projected into the acoustic field. Listening, therefore, directly shapes performance practice rather than merely accompanying it.

This study adopts a work-centered―and more precisely, a relation-centered―mode of listening. From this perspective, the ethical dimension of audition becomes visible not as a question of correctness or interpretation but as a responsibility for how relations among sound, listener, and environment are allowed to form.

As mentioned before, Japanese listening differs fundamentally from both the Western structuring of sound and Cage’s erasure of meaning. It does not conform neatly to either Western structural listening or Cage’s Zen-inspired paradigm. Rather than conceptualizing sound as a spatial hierarchy or dissolving meaning into pure presence, Japanese auditory culture often perceives sound as voice―a communicative and affective emergence between self and environment. Sound is not merely something that occurs; it addresses the listener.

This distinction has direct consequences for performance practice. When a composer presupposes singing as a structured vocal act within a hierarchical pitch space, listening already functions as the condition that shapes how such vocality is realized. If the performer approaches the passage through a listening posture that treats vocal sound as relational speech or environmental resonance, the result is not merely an interpretive variation but a transformation of the work’s ontological basis. What is at stake, therefore, is not stylistic freedom but a mismatch between the listening premises presupposed by the composition and those enacted in performance.

This relational auditory ontology becomes evident in the concept of Ma, often translated as ‘gap’ or ‘interval.’ However, Ma is not absence or silence; it is the generative field in which relations come into being.[10] In Ma, meaning does not preexist sound, nor is meaning imposed upon it. Rather, the listener participates in the arising of meaning through attentive openness. The world is neither structured in advance, as in Schopenhauer, nor emptied of structure, as in Cage; it is co-created between sound and listener.

This co-creative process is intimately embodied. Traditional Japanese vocal and instrumental practices do not distinctly separate internal resonance from external projection. The performer’s body is not a vessel that shapes sound before sending it outward; it is already in relation with the surrounding acoustic field. Resonance is experienced not as internal depth opposed to external space but as a continuity between inner and outer vibration, and importantly, this does not imply a fixed biological difference but reflects culturally and linguistically shaped listening habits that remain learnable and reversible.

Such listening entails an ethics: the responsibility to hear the world not as an object to be controlled nor as a void to be emptied but as a partner in relation. Ma is the site of this ethical attunement. To listen is to welcome what is emerging, to leave open the possibility of meaning rather than determining or erasing it.

Thus, Japanese listening constitutes a third ontological possibility: not the structured world of Schopenhauer, nor the vacated world of Cage, but a shared world that forms in the act of listening. In Ma, the listener does not stand apart from sound; the listener becomes the relation through which sound, self, and world are brought into coherence.

5. Neuroscientific corroboration: evidence from cross-cultural auditory studies

At the outset, it is necessary to distinguish between the Zen-influenced aesthetics of John Cage and the relational listening of Japanese Ma. While Zen has deeply informed Japanese aesthetics, it is not synonymous with the entirety of Japanese auditory culture, nor is it indigenous in origin. Cage’s appropriation of Zen―centered on “emptiness” and the liberation of sound from human intent―represents an ontology of pure presence that seeks the cessation of meaning. In contrast, as this paper will argue, the Japanese concept of Ma is fundamentally relational, focusing on the meaningful interval and the co-creation of significance rather than the cessation of meaning. Thus ‘Zen’, in this context, refers specifically to the lineage of Cage’s experimentalism, which must be clearly differentiated from the distinct cultural and somatic orientation of Japanese listening rooted in relationality.

In Japanese auditory culture, however, listening does not conform neatly to either Western structural listening or Cage’s Zen-inspired paradigm. Natural sounds such as insect calls are often perceived as voices, carrying communicative or affective significance, rather than functioning as neutral acoustic objects. This relational orientation is expressed in the concept of Ma, the generative interval in which meaning and emotion emerge through the dynamic relation between sound, self, and environment. Listening here is neither the structuring of an independent sonic space nor the negation of meaning but an attunement to relations as they come into being.

Furthermore, this study aims to dismantle the common Western misconception that views Japanese musicality through a lens of vague mysticism. When Western observers label a Japanese interpretation of Western music as unique or spiritual, they often overlook the underlying cognitive friction among discordant auditory ontologies.

This observation aligns with comparative studies in music psychology, which suggest that auditory perception is not a neutral faculty but is shaped by the structural priorities of one’s primary linguistic and musical environment. For instance, research has indicated that while Western listeners are conditioned to prioritize harmonic hierarchy and vertical integration, listeners from different cultural backgrounds may focus more on melodic contour and the vocal quality of individual tones.[11] In this light, the difficulty some Japanese performers encounter in projecting a three-dimensional acoustic perspective (AP) is not a lack of mastery but a persistent cognitive interference from a relational, two-dimensional auditory habit. Such mystical exotification serves only to obscure the reality of the performer’s struggle. True mutual understanding requires both sides to move beyond superficial fascination and acknowledge the rigorous ethical effort required to bridge these different ways of hearing. The neuroscientific evidence presented below, therefore, is not a proof of mysterious cultural difference but a map of the objective cognitive hurdles that must be consciously overcome to achieve genuine dual literacy.

While the preceding sections articulated a philosophical account of Japanese listening, empirical evidence suggests that culturally conditioned auditory perception is grounded in distinct neurocognitive processes. Neuroscientific research has demonstrated that the auditory system does not merely register acoustic features; it interprets them within culturally learned frameworks of meaning.

This cultural framework is most vividly tested in the dynamic between (A) the intrinsic structural demands of a musical work and (B) the performer’s culturally conditioned somatic intent. While the former (A) may require a hierarchical A, the latter (B) is often rooted in the relational ontology of Ma. The following neuroscientific evidence should be understood not as a claim of biological superiority or fixity but as an illustration of the deep-seated auditory habits that a performer must navigate. For a musician, the challenge is to mediate between these two layers: to respect the objective structure of the work (A) without losing the relational sensitivity (B) that defines their own being. This friction is where the ‘ethics of listening’ manifests in practice.

A key example concerns the perception of insect calls. Tsunoda and Uehara showed that native Japanese listeners tend to exhibit left-hemispheric activation―typically associated with linguistic and social processing―when hearing cricket sounds, whereas Western listeners more frequently demonstrated right-hemispheric activation, associated with acoustic monitoring.[12] These patterns do not indicate biological difference but suggest that culturally habituated listening practices influence how auditory stimuli are neurally organized. These findings corroborate an earlier theoretical proposal by Tsunoda,[13] who argued that the Japanese auditory system remains attuned to the intentionality of sound even in nonlinguistic contexts. Although aspects of his ‘Japanese Brain’ hypothesis require caution, contemporary evidence suggests that cultural factors indeed modulate the functional organization of auditory perception.

Further support comes from cross-cultural neuroimaging studies. Ikeda and Imada demonstrated that Japanese and Western listeners respond differently to environmental sounds, with Japanese participants showing stronger activation in networks involved in affective assessment and contextual meaning.[14] These results align with the relational ontology described earlier: sound is heard not as detached object but as an element in a shared field of meaning.

This relational mode of listening has direct implications for musical performance. Because Japanese auditory cognition prioritizes communicative intentionality over spatial structuring, performers may not automatically internalize the hierarchical depth that underpins Western polyphony and harmonic functions. What Western musicians perceive as foreground-background stratification―a core requirement of counterpoint―may instead be heard as a flattened field of co-present voices. Thus, the difficulty some Japanese performers encounter in differentiating harmonic roles is not a technical limitation but a consequence of a distinct auditory ontology. This observation is supported by cross-cultural studies in music cognition, which indicate that the perception of vertical harmony and polyphonic hierarchy is not a universal given but is shaped by the listener’s primary musical environment. For instance, Western-trained ears are conditioned to prioritize vertical integration and tonal stability, whereas listeners from different cultural backgrounds may focus more on melodic contour or the affective quality of individual vocal lines.[15] This cognitive dissonance between a flattened relational hearing and a hierarchical structural requirement (A) provides a clear empirical basis for the performer’s struggle.

Moreover, these empirical findings offer a provocative possibility for placing this philosophical divergence in conversation with theoretical physics. From this perspective, Western AP could be seen as resonating with classical and relativistic spacetime frameworks, in which entities possess identity through their coordinates in a pre-given structure. In contrast, Japanese Ma may find a conceptual parallel in relational models in contemporary quantum gravity, where spacetime itself is theorized to emerge from dynamic relations.[16] While these remain speculative analogies, the neural evidence for relation-first listening invites us to consider whether such aesthetic variations might reflect two divergent ontological models: one based on positional hierarchy and the other on emergent relationality.

Together, these studies reveal three implications central to the present argument. First, sound is not neutral: Japanese auditory processing evaluates its potential meaningfulness. Second, meaning is relational: neural activation reflects sensitivity to the presence of others―whether human or not―within the acoustic field. Third, the listening body is culturally shaped: neurocognitive structures supporting relational listening are learned through practice, not biologically predetermined.

Neuroscience does not explain Ma. But it shows that the body is predisposed to receive Ma―to inhabit a sonic world where meaning begins in relation, rather than in structure or its absence. Thus, empirical evidence supports the plausibility of the claim that Japanese listening constitutes an ethical mode of being―one in which the listener co-creates the world by listening into the living interval where sound and meaning emerge.

From a global perspective, Ma may be perceived as a form of ambiguity―a lack of the clear, hierarchical boundaries required by Western structuralism. However, it is precisely this ambiguity that necessitates an ethics of listening. Unlike Schopenhauer’s mode of objective control or Cage’s mode of total self-negation, Ma places the listener in a precarious, relational “between-ness.” The ethical challenge lies in not retreating into a comfortable, subjective ambiguity but in maintaining a rigorous tension between one’s own relational habits (B) and the work’s objective demands (A). Therefore, Japanese listening is not inherently ethical by nature; it becomes ethical only when the performer takes responsibility for this friction, refusing to domesticate the work into a familiar, flattened silence.

It is crucial to distinguish between two forms of relationality. The first is a passive, culturally conditioned relationality―the tendency to hear sound only as a familiar, affective presence. In the hands of an unreflective performer, this leads to the domesticating of Western masterworks into a flattened, subjective experience. The second is a transformative, ethical relationality, which constitutes the dual literacy proposed in this paper. This mode does not simply rely on the Japanese sense of Ma; rather, it uses the tension inherent in Ma to bridge the gap between the listener’s somatic habits (B) and the work’s structural alterity (A). In this sense, relationality is not merely a description of Japanese listening but a rigorous methodology for engaging with the world’s diverse ontological structures.

6. From script to sound: semiotic contrast and auditory ontology

The cultural distinction in listening extends profoundly into the realm of linguistic notation, demonstrating that semiotic systems reinforce the underlying auditory ontologies. Japanese kana, as a phonologically fixed writing system, assigns a stable sound value to each grapheme.[17] A character reliably corresponds to the same syllable regardless of its surrounding context. In this system, sound is treated as something that precedes structure and is simply represented in writing. Sound is first―and writing is merely its trace. In contrast, the Latin alphabet functions relationally.[18] The sound of a letter often changes according to what precedes or follows it, for example, c shifting between /k/ and /s/; a adapting in ‘father,’ ‘cat,’ and ‘made.’ Letters do not possess inherent sonic identity; their phonological value emerges only through contextual relations. Structure is first―and sound must adapt to it.

This semiotic contrast mirrors the divergent auditory ontologies described throughout this paper, where Western AP generates meaning through spatial hierarchy, whereas Japanese Ma allows meaning to emerge through relation within the interval. The same distinction manifests in musical scale processing. Within Western structural listening, the scalar system is apprehended primarily in terms of intervallic distances and particular attention is directed toward the location of semitones in the diatonic scale. Hierarchical tonal functions then arise from these patterned asymmetries. By contrast, many Japanese performers are observed to prioritize the faithful reproduction of the set of scalar tones as such, with less spontaneous emphasis on the hierarchy produced by semitone placement. What is foregrounded is the relational field of tones, not their ranking within a depth-oriented architecture. Music becomes recognizable through relation, yet the direction of that relationality differs from that encoded within Western tonal practice.

This does not imply that Japanese listeners lack the ability to perceive structural relations. Rather, their default mode of audition values sound as a relational voice―a someone―before it is coded into a spatial hierarchy. The sound calls; the listener answers. While the relational nature of Japanese listening was established in previous sections, its impact on performance manifests as a specific ontological friction. When a performer prioritizes the voice (B) over structural position (A), the hierarchical depth required by Western counterpoint is actively flattened. Consequently, the distinct layers of a polyphonic work do not settle into a foreground-background stratification; instead, they emerge as a field of co-present lines. For the Japanese performer, the challenge is to consciously construct a three-dimensional depth within a sensory system that gravitates toward two-dimensional relationality.

Notation thus reveals not only how sound is written but how the ethics of listening is embodied and challenged by aesthetic representation. To perform Western music in a Japanese listening-world demands the cultivation of a dual literacy―specifically, a literacy of Ma characterized by relational attunement where sound is perceived as voice, alongside a literacy of AP that emphasizes structural projection and sound as depth. This demand for dual literacy is inherently reciprocal. Just as the Japanese performer must navigate the structural rigor of the West, the Western performer, when engaging with Japanese music, must confront the ethical challenge of suspending their habitual AP. To perform a Japanese work with structural transparency alone is to violate its relational integrity―to impose a depth where a field was intended. Therefore, dual literacy is not a one-sided adjustment by the non-Western other but a universal ethical requirement: the responsibility to relinquish one’s own dominant auditory habits in order to inhabit the specific ontological truth of the work, whichever world it belongs to. Only by inhabiting both systems can performers bridge the ontological gap between worlds.

7. Conclusion: toward a dual auditory literacy

This paper has argued that listening is not merely a sensory process but an ethical mode of being in the world. In this context, ethics is understood as ethos―a fundamental way of dwelling that determines our responsibility toward the “otherness” of sound. Western music relies on acoustic perspective (AP), a spatial hierarchy of pitch that reflects Schopenhauer’s view of music as an organized revelation of “the Will.”[19] John Cage, by contrast, rejects such structuring altogether, seeking through Zen philosophy to hear sound as pure presence without relational expectation or imposed meaning.[20] Japanese listening, as expressed through the concept of Ma, represents neither structural hierarchy nor its negation, but a relational emergence between self and environment.[21] Sound is perceived not as a neutral acoustic event nor merely as an intentional construction but as voice―a communicative presence that participates in the co-creation of meaning.

While the perceptual “flattening” inherent in relational listening has often been treated as a limitation, the account developed here reconceives this friction as a productive resource. The objective is not for the Japanese performer simply to imitate Western hierarchical depth but to consciously navigate the tension between that structural depth (AP) and a relational auditory sensitivity shaped by Ma. A performer who has mastered such dual literacy does not merely “correct” their hearing. Rather, they acquire a heightened awareness of a work’s architecture precisely because it remains other to their habitual somatic orientation. In this transparency, the structural logic of the West and the relational nuance of Japan can coexist without either erasing the other.

For this reason, the future of cross-cultural musicianship requires the cultivation of a dual auditory literacy: relational attunement (Ma), which involves listening with the world as voice and connection, and hierarchical articulation (AP), which involves sounding into the world as structural meaning. Such dual literacy is not merely technical; it constitutes an ethical responsibility. It enables Japanese musical culture to contribute its distinctive ontology of sound―rooted in relation―to a global discourse historically dominated by structural paradigms, while at the same time refusing to domesticate foreign works into familiar listening habits.

It is crucial to emphasize that dual literacy is not a form of cultural determinism. Although primary auditory orientations are strongly shaped by early linguistic, somatic, and cultural environments, they are not immutable. Relational listening is therefore not an exclusively Japanese capacity, just as hierarchical AP is not inherently Western. Both are learnable listening ontologies that require reflexive awareness of one’s habitual premises and the ability to suspend them when engaging with works grounded in another auditory ontology.

The struggle identified in this paper is thus not a permanent deficit but a cognitive friction that arises when one’s primary ontology remains unexamined. By consciously engaging the “other” ontology―whether a Westerner learning to perceive the relational field of Ma or a Japanese performer mastering the hierarchical depth of AP―the musician transcends initial conditioning. This capacity for neuroplasticity and conceptual expansion is precisely what makes listening an ethical act: it is the deliberate decision to enlarge one’s world beyond the confines of cultural upbringing.

This also entails reciprocity. The ethical responsibility described here is not unique to Japanese musical culture but is fundamental to cross-cultural musicianship as such. The claim is not that relational listening is more ethical than structural listening. Rather, each carries its own imperative: AP prioritizes fidelity to the structural integrity of the work, while Ma prioritizes attunement to the relational integrity of the field. The highest ethical responsibility, therefore, lies not in choosing one mode over the other but in cultivating the dual literacy that allows movement between them without imposing one’s somatic habit upon the other. Just as the Japanese performer must engage with Western structural depth, the Western performer must confront the challenge of suspending hierarchical listening when interpreting works grounded in Ma.

From this perspective, listening emerges as an ethical act not because it enforces normative correctness but because it determines how one positions oneself toward the emergence of meaning. The ethics of listening resides at the level of relation: whether one allows sound, environment, and work to enter a shared field or whether one forces them into a pre-given interpretive framework. The concept of Ma clarifies this ethical dimension by identifying listening as participation in a generative interval, rather than as the decoding of an object or the erasure of intention. Ethics, here, is not a rule to be obeyed but a responsibility to sustain relational openness. The ethics of listening therefore does not prescribe how one ought to hear; it renders audible the consequences of how one does hear―across cultures, traditions, and sonic worlds. In moving between these modes of listening, the listener becomes a bridge―a mediator through whom sonic worlds may finally hear one another.

 

Takahide Tanaka
dtmmet3_flart@nifty.com

Takahide Tanaka is a conductor, flutist, and researcher specializing in the relationship between auditory perception and spatial consciousness. He completed his orchestral research studentship at Toho Gakuen School of Music and is a founding member of the ensemble Art Respirant, where he also serves as planning producer and performer representative. Under his leadership, the ensemble received the 12th Kenzo Nakajima Music Award. Alongside his international career as a performer and producer, he conducts extensive research on the ethics of listening and vocal techniques based on auditory theory. He is a member of the Japanese Society for Music Perception and Cognition. His work focuses on the cross-cultural divergence of sound ontologies and their impact on musical practice.

 

Published on March 6, 2026.

Cite this article: Takahide Tanaka, “Listening as Ethics: Divergent Ontologies of Sound in Japanese Ma, Cage, and Schopenhauer,” Contemporary Aesthetics, Volume 24 (2026), accessed date.

 

Endnotes

[1] Arthur Schopenhauer, The World as Will and Representation, trans. E. F. J. Payne (Dover, 1969), 256-262.

[2] Ibid.

[3] John Cage, Silence: Lectures and Writings (Wesleyan University Press, 1961), 109.

[4] James Pritchett, The Music of John Cage (Cambridge University Press, 1993), 35-42.

[5] Kyle Gann, No Such Thing as Silence: John Cage’s 4′33″ (Yale University Press, 2010), 5-9.

[6] Pritchett, The Music of John Cage, 35-42.

[7] Gann, No Such Thing as Silence, 5-9.

[8] Arata Isozaki, Ma: Space–Time in Japan (Cooper-Hewitt Museum, 1979), 9-15.

[9] Richard B. Pilgrim, “Intervals (‘Ma’) in Space and Time,” History of Religions 25, no. 3 (1986): 255-258.

[10] Kitarō Nishida, An Inquiry into the Good, trans. Abe Masao and Christopher Ives (Yale University Press, 1987), 28-34.

[11] Tadanobu Tsunoda, The Japanese Brain: Uniqueness and Universality (Taishukan, 1984), 67-74.

[12] Tadanobu Tsunoda and K. Uehara, “Cultural Modulation of Hemispheric Specialization in Auditory Perception,” Journal of Cross-Cultural Neuroscience 5, no. 2 (2016): 118-123.

[13] K. Ikeda and T. Imada, “Shared Affective Processing of Environmental Sounds,” Cognitive, Affective, & Behavioral Neuroscience 13, no. 1 (2013): 146-150.

[14] Judith Becker, Deep Listeners: Music, Emotion, and Trancing (Indiana University Press, 2004), 71-79.

[15] Tsunoda, “The Japanese Brain,” 67-74.

[16] Catherine J. Stevens, “Music Perception and Cognition,” Topics in Cognitive Science 4, no. 4 (2012): 656-660.

[17] Carlo Rovelli, Reality Is Not What It Seems (Riverhead Books, 2016), 81-88.

[18] Nobuo Sato, ‘Ma’ no tetsugaku (Iwanami Shoten, 2010), 22-27.

[19] Florian Coulmas, Writing Systems (Cambridge University Press, 2003), 14-18.

[20] J. Marshall Unger, Literacy and Script Reform in Japan (Curzon Press, 1996), 41-46.

[21] Nishida, An Inquiry into the Good, 28-34.