The Human Layer of Artificial Intelligence
- Esra OBUT
- 6 days ago
- 7 min read
In recent years, I have worked on international platforms such as Outlier, Invisible, and micro1. These companies rely on human expertise to improve artificial intelligence models and work with specialists from different fields around the world on a project basis.
On these platforms, I took on three different roles. As a tasker, I produced the texts that AI models needed. As an evaluator, I assessed the generated content in terms of linguistic accuracy, contextual fit, and cultural appropriateness. In some projects, I worked in a more editorial role, evaluating the integrity, punctuation, and consistency of the text. Over time, I was also placed on the Queue Manager waitlist, which meant moving closer to a position responsible for coordinating projects and managing other contributors.
The working process differed from project to project. To join a project, you first had to study that project's specific rules and guidelines, take an assessment, and reinforce your knowledge through webinars. No project's rules were exactly like another's. Each one was an independent world shaped by its own context and standards. I worked mainly in Turkish text localization and cultural content production. Through this process, the question that stayed with me was not only who teaches artificial intelligence a language and a culture, but also the clearer realization that AI is not an independent system working on its own. Behind it, there is human knowledge, human labor, and human judgment.
Artificial intelligence does not work alone
Artificial intelligence is often described as a system that learns, thinks, and makes decisions on its own. But this narrative is incomplete. Behind many of the AI systems we use today are not only engineers and data scientists but also large datasets. There are also domain experts, language specialists, translators, editors, content evaluators, lawyers, healthcare professionals, educators, and people with cultural and contextual knowledge.
For a model to produce better responses, those responses need to be evaluated by humans. Understanding whether a text sounds natural, whether an answer fits the context, whether an expression may appear culturally problematic, or whether an image represents something accurately is often not possible through a purely technical measurement. This is where human domain knowledge, language experience, intuition, and judgment come into play.
This is why it is not enough to think of artificial intelligence only as a technological issue. AI is also a matter of language, culture, interpretation, meaning, and human judgment. This is where the growing importance of the humanities, editorial thinking, and cultural literacy in this field becomes clear, because these systems do not merely process information; they also try to produce outputs that appear meaningful, accurate, and contextually appropriate to humans.
Ethan Mollick approaches artificial intelligence not as a fully automated tool, but as a form of "co-intelligence" that gains meaning when it works together with humans. I find this approach important because using AI effectively does not mean accepting everything it produces as it is. On the contrary, working with AI requires humans to become more careful, more selective, and better evaluators.
Localization is not merely translation
On these platforms, I worked mainly on Turkish projects. This was not a coincidence. In these kinds of companies, each language is usually handled by people who are native speakers of that language. The reason is simple: only someone who has grown up within a language can sense its cultural nuances, contextual accuracy, and subtle mistakes, because language is not made up of words alone.
You cannot always check in a dictionary whether a sentence sounds natural, whether an expression is too formal, too artificial, or appropriate for the context. Sometimes the problem is not grammar. A sentence may be technically correct, but that is simply not how it would be said in Turkish. Sometimes an answer provides information, but its tone is wrong. Sometimes the content seems harmless, yet within its cultural context, it feels strange, incomplete, or displaced.
This is exactly where localization comes in. This process involves more than translating a text into another language. It requires ensuring that an AI model's answer to a question is not only grammatically correct but also culturally situated.
The discussions by Emily M. Bender and Timnit Gebru on large language models are important here as well. Language models can produce impressive and fluent texts, but evaluating the meaning, context, and social impact of those texts remains a human responsibility. The fact that a text is fluent does not mean that it is accurate, reliable, or appropriate.
One of the things I noticed in these projects was that when we talk about something being "culturally correct," we are not actually talking about something fixed. As the context changes, the boundaries of what is correct also change. A response may be appropriate in one scenario and problematic in another. The same word may sound natural in one place and rude, too distant, or too artificial in another.
How artificial intelligence "sees" Turkey
Apart from text-based projects, I also worked on a project related to visual data. In this project, I submitted photos I had taken myself as AI training data. The photos included visual elements related to Turkey, such as signs written in Turkish, Turkish food, historical monuments, and traditional clothing.
At first glance, taking a photo, uploading it, and submitting it may seem like a very simple task. But the process was actually more complex than that, because depending on the project title, you had to decide which image "represented Turkey." Is it a sign more representative, or a dish, a historical building, a traditional garment, or a modern street scene?
The photo I took, the angle I chose, and my decision that "this image fits this title" all contributed, even if in a small way, to the visual memory of Turkey within an AI system. This contribution may not seem decisive on its own, but when thousands of similar choices come together, they begin to influence how a model recognizes a country, a culture, or a form of everyday life.
Kate Crawford argues that artificial intelligence is not as "artificial" as it may seem; it is built through human labor, data, classification, infrastructure, and social context. This idea closely overlaps with what I observed in these projects. The cultural memory of artificial intelligence is shaped not only by large datasets but also by these small yet effective human decisions.
Human judgment begins where rules end
One of the most instructive areas in these projects was edge cases. Every project had detailed rules. But no set of rules can cover every possibility in life.
In some cases, the content was not clearly right or wrong. A response was partly appropriate but had shortcomings. An expression was linguistically fine but contextually debatable. An image matched the title but had weak representational value. In such moments, simply applying the rules was not enough; it was necessary to understand what the rules were trying to achieve and interpret that purpose according to the specific situation.
Because of NDA obligations, I cannot share specific project, client, or content examples. But in general terms, I can say that the same content could sometimes be evaluated differently by two different evaluators. This did not always mean that someone had made a mistake. Often, it reflected the nature of human judgment, because evaluation is not only measurement; it is also interpretation.
Lucy Suchman's work on human-machine relations is worth remembering here. Technological systems are never merely technical structures. They gain meaning through the context in which they are used, the people who interact with them, and the decisions made in the moment. This is precisely what makes the human labor behind AI so important.
Human evaluation is essential in everyday and professional life
This issue is not limited to AI training platforms. Today, artificial intelligence is increasingly used in companies, agencies, content teams, and consulting processes. Yet the same basic issue remains: AI produces outputs, but a human must decide whether those outputs are good, accurate, contextually appropriate, ethical, reliable, and usable.
AI often speaks with persuasive fluency. Even when it is wrong, it can sound confident; it may work with incomplete context, miss the cultural tone, produce overly general statements, or write texts that do not match an institution's language. For this reason, one of the most important skills in the age of AI is not only writing good prompts, but also making good evaluations.
In NIST's AI Risk Management Framework, AI systems are described as socio-technical systems. This is an important distinction because these systems are not made up only of models, code, or algorithms. They operate together with human behaviors, organizational culture, usage context, risk perception, and decision-making processes.
Human-in-the-loop is not an abstract concept
Today, the phrase "human-in-the-loop" is used often, but it can sometimes remain too abstract. In my own experience, however, it had a very concrete meaning. A human was really there, reading a sentence, scoring a response, selecting an image, considering whether an expression sounded natural, interpreting a guideline, and making a decision in a borderline case.
I believe the future of working with artificial intelligence will be shaped not by more automation alone, but by more qualified human evaluation. Not merely by receiving faster outputs, but by understanding the value of those outputs. Not by replacing humans with AI, but by making human judgment more visible and more effective together with AI.
The invisible layer with concrete impact
Today, artificial intelligence is often discussed through large models, algorithms, automation, and productivity. Yet there is often a missing layer in these conversations: the human layer.
For AI to provide better answers, speak more naturally, and produce content that is more appropriate to cultural context, someone needs to keep evaluating it. Partnership on AI's work on data enrichment workers also draws attention to this invisible human labor. People who label data, evaluate content, and perform quality control play a critical role in the development of AI systems.
Especially in Turkey, companies adopting AI need to think more seriously about this human layer, because using these systems is not only a matter of integrating a tool. To evaluate whether a model working in Turkish is truly performing well, there is a need for people who know Turkish, understand the culture, and can read the context.
AI provides speed, multiplies possibilities, and produces the first draft, but what determines its value is how that output is evaluated through human knowledge. This is why the real issue in the age of AI is not that machines will simply replace humans, but that humans will need to use their professional judgment more consciously, more responsibly, and more visibly.
References
Bender, Emily M.; Gebru, Timnit; McMillan-Major, Angelina; Shmitchell, Shmargaret. "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?" FAccT, 2021.
Crawford, Kate. Atlas of AI: Power, Politics, and the Planetary Costs of Artificial Intelligence. Yale University Press, 2021.
Mollick, Ethan. Co-Intelligence: Living and Working with AI. Portfolio / Penguin, 2024.
NIST. Artificial Intelligence Risk Management Framework (AI RMF 1.0). National Institute of Standards and Technology, 2023.
Partnership on AI. Responsible Sourcing Across the Data Supply Line.
Suchman, Lucy. Human-Machine Reconfigurations: Plans and Situated Actions. Cambridge University Press, 2007.



Comments