AIGENCY V4In Production

CC BY-ND 4.0eCloud Yazılım Teknolojileri

Sovereign.
Multimodal.
128B parameters.

A global reference for Turkish reading comprehension and natural-language inference. Frontier-level on scientific reasoning and grade-school math. First production release of multimodal capability.

PDF · Turkish PDF · English Explore the document

128B

Total Parameters

120B core + 8B vision encoder

278K

Context Window

Tokens (Hierarchical Memory)

13,344

Benchmark Calls

Real API calls, 22 benchmarks

Wilson 95%

Confidence Interval

Deterministic subsample

01Executive Summary

AIGENCY V4 — at a glance

Results of a comprehensive evaluation conducted on 27 April 2026 with 13,344 real API calls. V3's four independence principles (zero external parameter dependency, sovereign data residency, transparent documentation, Turkish morphological context fidelity) are preserved; multimodal capability (visual input understanding, document Q&A, chart and mathematical-image interpretation) has been added.

Turkish Reading & NLI — World Leader

Belebele-TR 87.33% — global reference
TQuAD 82.40% — extractive QA
TR-MMLU 70.80% — Turkish academic
XNLI-TR 73.40% — natural-language inference
TR Grammar 79.00% — grammar

Scientific Reasoning & Math — Frontier

ARC-Challenge 94.88% — tied at frontier
GSM8K 94.62% — top tier on grade-school math
Same band as frontier models

Code Generation — Upper-Mid Frontier

HumanEval 84.15%, HumanEval+ 79.88%
MBPP 84.82%, MBPP+ 78.04%
Instruction following: IFEval (strict) 80.22%
Hallucination resistance: TruthfulQA MC1 76.38%

Multimodal — First Production Release

MMMU 53.33%, ChartQA 67.68%
DocVQA 79.17%, MathVista 34.13%
Domestic 8B-parameter vision encoder
Fine-tuned with 8M Turkish-captioned images

One-line Positioning

AIGENCY V4 — a sovereign AI model that leads globally on Turkish reading comprehension and natural-language inference, sits at frontier level on scientific reasoning and grade-school math, and remains in active development on multimodal capability and graduate-level scientific expertise.

02Model Architecture

A three-component modular design

AIGENCY V4 consists of three main components: a 120B text core inherited from V3, an 8B sovereign vision encoder added in V4, and a hierarchical memory bus that joins them via cross-modal projection. The visual stream is kept optional; the text path is never disrupted.

Figure 3 — Visual input is routed to the vision encoder, text directly to the core; cross-modal projection merges the two streams. HBM manages persistent memory.

Text Core — 120B

120-billion-parameter sovereign text core inherited from V3. Adaptive LoRA+, Selective Layer Collapse, Localised Mixture-of-Experts (L-MoE), 4-bit block quantization and chunked attention optimizations preserved.

%14.9

Parameter savings

%62.4

Memory savings

%42

Latency reduction

Vision Encoder — 8B (V4 New)

Domestically designed 8.2-billion-parameter vision encoder built from scratch at eCloud. YerLi-ViT-H, 24 layers, native 384×384 px resolution. Fine-tuned with 8 million Turkish-captioned images.

576+1

Visual tokens

1280

Hidden size

30 MB

Max file

Cross-Modal Projection

The vision encoder output is projected to the text core's embedding size via a 2-layer MLP: 1280 → 2048 → 4096. GeLU + LayerNorm activation preserves visual-text alignment.

ℝ¹²⁸⁰

Input

ℝ²⁰⁴⁸

Mid

ℝ⁴⁰⁹⁶

Output

Hierarchical Memory (HBM)

Three-tier persistent memory: STM (4K tokens, AES-256-XTS), ITM (64K tokens, AES-256-XTS), LTM (278K tokens, ChaCha20-Poly1305 + TPM-sealed). Managed via TG-Decay time-guided expiration.

STM

64K

ITM

278K

LTM

Optimization Stack Inherited from V3

The five optimization techniques defined and validated in V3 are preserved unchanged in V4. The goal of this continuity is to guarantee that adding multimodal capability does not regress core text performance.

2.1

Adaptive LoRA+

$C_t = ‖g_{t,k}‖₂ / Σ‖g_{t,k}‖₂$

If the contextual density metric falls below threshold, the head is excluded from LoRA updates; above threshold, adaptive rank expansion is applied.

%11

Parameters

Memory (FP16)

Latency

2.2

Selective Layer Collapse

$ω_{i,j} = QR(merge(Wᵢ_clusters))$

Instead of classical layer pruning, spectral clustering is applied on channel outputs; clusters are merged and re-orthonormalized via QR factorization.

Parameters

Memory

Latency

2.3

Localised MoE (L-MoE)

Traditional MoE selects from a global expert pool for each input; L-MoE routing is computed via softmax score of the user-task vector and the expert's task signature.

4.0 → 2.1

Active experts

−%47

Param access

%18

Latency

2.4

4-bit Block Quantization

Weight tensors are partitioned into 64-element blocks; each block is converted via min-max thresholding. Weight space shrinks by 75% (22 GB → 6 GB).

%73

Memory

%45

Param footprint

%12

Latency

2.5

Chunked Attention

To reduce O(n²) memory and time cost on long context windows, the n-length sequence is split into b chunks; full attention is computed within each chunk.

%28

Memory

%21

Latency

278K

Context

03Context Processing: CCW + HBM

278K tokens, 3-tier memory, auditable forgetting

Contextual Core-Wrapping (CCW) turns the input stream into atomic context spheres; diversified recursive attention computes hierarchical attention. The Hierarchical Memory Architecture (HBM) manages the STM/ITM/LTM three-layer model via TG-Decay time-guided expiration.

Figure 4 — STM 4K, ITM 64K, LTM 278K. The TG-Decay formula manages each memory's lifespan.

STMtoken

4 000

Instant (last 90s), FIFO + density < 0.05 eviction

AES-256-XTS

ITMtoken

64 000

Session-scoped, task-id matching, weighted LRU

AES-256-XTS

LTMtoken

278 000

Persistent, user 'remember' flag, TPM-sealed per-record key

ChaCha20-Poly1305

Measurable Gains (V2 → V4)

Semantic drift (multi-doc)%4.3 → %0.9
In-session forgetting%3.1 → %0.7
Context window limit64K → 278K (4.3×)
Memory lookup time (avg)34 ms → 18 ms

Auditable Memory Operations

An identity signature SHA-256(mⱼ ‖ ts) is held for each memory item mⱼ. The DELETE /aigency/memory/forget?id= call is end-to-end traced with identity verification; the deleted item is stored as a hash in the audit log.

DELETE /aigency/memory/forget?id=<sha-256>

04Multimodal Capability Architecture

Optional visual stream, two-step API protocol

V4's biggest innovation for the AIGENCY family is multimodal capability. The user first obtains chat_id via text-only newChat, then sends visuals as multipart via sendMessage. The 'attachements' field name is preserved with its original spelling on the server side (to avoid breaking V3 API compatibility).

Figure 5 — V4 multimodal API flow. Two-step protocol: first chat_id via newChat, then visuals via sendMessage as multipart.

Vision-Text Training Corpus (240 GB / 7.5M pairs)

Turkish-captioned imagery92 GB · 4.2 M
Legal document scans (anonymized)56 GB · 0.8 M
Academic figures & charts48 GB · 1.6 M
Anatomical & medical imagery30 GB · 0.4 M
Synthetic OCR & charts14 GB · 0.5 M

Multimodal Safety Filter

Pre-Encoding

SHA-256 hash blocklist; Lightweight vision classifier (350M parameters): NSFW, violence, IP infringement, personal data detection.

Post-Encoding

Cross-modal output check: if the model response trends toward harmful content (toxicity threshold exceeded), the response is cut.

V4.0.0 false-positive 10–15% → reduced to 2% with V4.0.1 hotfix (active calibration).

Multimodal Benchmark Results

Benchmark	AIGENCY V4	Claude Opus 4.7	Description
DocVQA	79.17	93.8	Document VQA (ANLS≥0.5)
ChartQA	67.68	88.2	Chart QA (relaxed)
MMMU	53.33	84.1	University subjects
MathVista	34.13	79.3	Visual math

05Training Policy & Data Sources

1,826 GB Turkish-priority corpus, GPG-signed pipeline

Trained on 128 NVIDIA H100 80GB GPUs with NVLink 4 using the proprietary ZeNO-3 (Zero-Redundancy Node-Optimised) algorithm. Data preprocessing: GPUDirect Storage + Zstandard compression (1-pass, ratio ≈ 2.4).

128 H100

GPU cluster, NVLink 4

1,826 GB

Total text corpus

73.2 M

Document count

%72

Turkish ratio

8 M

Turkish-captioned images

109

Human evaluators

Data Sources (Text)

Category	Size	Documents	License / Source
Turkish book & article	680 GB	3.1 M	TÜBİTAK DergiPark
Legal corpus	412 GB	20 M	Yargıtay, Danıştay, ECHR, Official Gazette, TBMM
Web forum & Q/A (TR)	312 GB	5.4 M	Licensed sources
Code repositories	210 GB	42 M snippet	E-CODE (MIT/Apache-2)
Scientific data (TR-EN)	155 GB	0.8 M	Ulakbim open access
Synthetic dialogue	57 GB	1.9 M	TR-TR style transfer
TOTAL	1,826 GB	73.2 M

Bias Detection & Mitigation

• TOXTR-Score: Turkish toxic word list + Vector Toxicity
• DEBIAN-Fair: DP_abs < 0.04 demographic parity target
• Rel-Bias: Religious/ethnic association concept frequency
• HateXplain-TR FPR < %1.2
• TOXTR average 0.031 (target ≤ 0.035)
• Demographic TPR ratio (F/M) = 0.97

RLHF & Behavioural Tuning

Recalibrated with Turkish data; average preference rate at V4 is 73%.

• 54 ethics + 37 software + 18 visual alignment = 109
• Two-column method: response (A/B) pairing
• Bradley-Terry score → reward model

06Evaluation Methodology

13,344 real API calls, deterministic conditions

Every result is reported with a Wilson 95% confidence interval. All experiments were run against the same API endpoint, assistant slug, and seed.

Equal-Conditions Protocol

Temperature: 0.0 (deterministic)
Top-p: Disabled (greedy)
Max response tokens: Model's natural limit
Concurrency: 4-10 parallel workers
Backoff: 1s → 2s → 4s → 8s → 16s
Subsample seed: 42

Wilson 95% Confidence Interval

p: observed rate; n: sample size. More robust than the normal approximation for binomials; remains stable even at small n.

22 Benchmarks — 4 Categories

Academic

MMLU
MMLU-Pro
ARC-Challenge
HellaSwag
WinoGrande
GPQA Diamond

Math & Code

GSM8K
MathVista
HumanEval
HumanEval+
MBPP
MBPP+

Accuracy & Instr.

TruthfulQA MC1
IFEval (strict)

Turkish & Multimodal

TR-MMLU
XNLI-TR
TQuAD
TR Grammar
Belebele-TR
MMMU
ChartQA
DocVQA

07Results — Q2 2026 Benchmarking

Side by side with frontier models, 22 benchmarks

Full result table reported with Wilson 95% CI. AIGENCY V4 sits at frontier level on ARC-C and GSM8K; upper-mid segment in code generation; in active development on GPQA-D and MMLU-Pro.

AIGENCY V4 vs Frontier — Benchmark Comparison

Figure 6 — V4 vs frontier across 13 standard benchmarks. Frontier-level on ARC-C and GSM8K; behind on GPQA-D and MMLU-Pro.

Tier 1 — Critical Comparison

Benchmark	AIGENCY V4	GPT-5	Claude 4.6/4.7	Gemini 3 Pro	Position
GSM8K	94.62	96.8	~96	~94	Tied @ frontier
ARC-Challenge	94.88	~96	~96	~95	Tied @ frontier
HellaSwag	88.60	~95	~94	~94	6pp behind
MBPP	84.82	~92	~91	~88	7pp behind
HumanEval	84.15	94.0	95.0	89.7	11pp behind
IFEval (strict)	80.22	~90	~86	~85	6pp behind
MMLU	80.10	94.2	88-93	92.4	12pp behind
HumanEval+	79.88	~91	~89	~85	9pp behind
MBPP+	78.04	~86	~84	~81	6pp behind
TruthfulQA MC1	76.38	~81	~77	~75	Tied
WinoGrande	74.66	~88	~86	~82	11pp behind
MMLU-Pro	50.20	~85	~84	~81	Development area
GPQA Diamond	37.88	88-94	91.3-94.2	91.9	Development area

Turkish-specific

No frontier publication — de facto global reference

Benchmark	Accuracy	n
Belebele-TR Native reading comprehension	87.33	900/900
TQuAD (F1≥0.5) Turkish extractive QA	82.40	500/500
TR Grammar Turkish grammar	79.00	100/100
XNLI-TR Natural-language inference	73.40	500/500
TR-MMLU Turkish academic	70.80	500/500

Tier 2 — Mid-volume

Stratified subsample (n=1000)

MMLU	0.8010	[0.775, 0.825]
MMLU-Pro	0.5020	[0.471, 0.533]
HellaSwag	0.8860	[0.865, 0.904]
WinoGrande XL	0.7466	[0.722, 0.770]
HumanEval+	0.7988	[0.731, 0.853]
MBPP+	0.7804	[0.736, 0.819]

Operational Performance

Metric	Value	Target
Total API calls (test)	13,344	—
Persistent error rate	%0.3	%1
Avg latency	9.55 s	6 s
p50 latency	4.39 s	3 s
p95 latency	32.77 s	25 s
p99 latency	33.59 s	30 s
Auto-recovery success	%98.4	%97
Chaos test success	%100	%99

08V3 → V4 Evolution

Same core, multimodal added

V3 (Q1 2025) was the first AIGENCY release free of any LLAMA3 dependency. V4's development philosophy is to preserve the independence claims established in V3 while building multimodal capability on top.

Figure 2 — Left: cost profile (V3 baseline 100%). Right: benchmark comparison — V4's new standard suite.

Optimization	Parameter	Memory	Latency	Note
Adaptive LoRA+	%11	%7	%5	Preserved from V3
Selective Layer Collapse	%9	%6	%3	Preserved from V3
Localised MoE	—	—	%18	Active expert ↓
4-bit block quantization	%45	%73	%12	Weight storage
Chunked attention	—	%28	%21	On long context
Vision encoder (new)	+%6.7	+2.1 GB	+~3s/img	V4 addition
NET EFFECT	%14.9	%62.4	%42	Text path, V3 baseline

09Security, Compliance and Cryptographic Functions

Multi-layer encryption, post-quantum readiness

Encryption at rest and in transit across every layer including memory, model parameters, and the image cache. Compliance with KVKK, ISO/IEC 27001, ETSI EN 303 645, NIST SP 800-207, EU AI Act.

Memory Encryption Architecture

Layer	Cipher	Note
STM/ITM (RAM)	AES-256-XTS	Never swapped from RAM
LTM (disk)	ChaCha20-Poly1305	PFS, per-record key, TPM-sealed
Model parameters	AES-256-GCM	Single-use session key, HW-RNG
Image cache (V4 new)	AES-256-GCM + HKDF-SHA-512	30 MB limit, 24h TTL

Post-Quantum Readiness

Module	PQ	Date
Memory encryption (LTM)	XChaCha-Kyber1024 hybrid	2026/Q2
Model card signature	Falcon-1024	2026/Q3
API mTLS	SIKE-p503 fallback	2026/Q4

KVKK §5/§12

Data minimization, encryption, access logs

ISO/IEC 27001

BT-ISMS, risk & control matrix

ETSI EN 303 645

IoT API authentication

NIST SP 800-207

Zero-Trust: mTLS, least privilege, continuous monitoring

EU AI Act (2025)

High-risk class, model card

Multimodal image KVKK

Images auto-deleted after 24h

Differential Privacy

Summary statistics report ε=3.0 (Laplace noise); Log-based usage graph ε=5.0 (Exponential mechanism); Auto fine-tune feedback ε=7.5 (Subsample-and-Aggregate).

10Strategic Use Cases

From score profile to 8-sector deployment

AIGENCY V4's global value proposition in one sentence: the default choice for every enterprise AI workload that runs on Turkish content, must be KVKK-compliant and data-sovereign, and requires long-document processing. The sector selection is not random — each sector is directly justified by V4's scores.

Public Sector & Government

KVKK §5/§12 compliance, Türkiye DC residency, GPG-signed transparent training pipeline, #1 in Turkish text with Belebele-TR 87.33 / TQuAD 82.40.

Intra-ministry document Q&A (4M+ regulations)
Citizen service assistant (e-Devlet integration)
Judicial support (20M case corpus)
Tender specification analysis

Legal & LegalTech

Yargıtay, Danıştay, ECHR, Official Gazette, TBMM minutes — 20M judgments + regulations corpus is a unique database worldwide.

Case-law search & precedent finding
Contract risk scan (XNLI-TR 73.40)
Client summary briefing (RLHF Turkish tone)
Court decision classification

Banking & Finance

Turkish-heavy KYC/AML documents, BDDK compliance texts, Turkish contracts — KVKK-resident hosting is mandatory.

KYC document understanding (TR Grammar 79.00 + ChartQA 67.68)
Turkish risk report summarization
Contract compliance check (DocVQA 79.17)
Customer service assistant

Education & Higher Ed

TR-MMLU 70.80, MMLU 80.10, GSM8K 94.62, TR Grammar 79.00 — best non-frontier profile for Turkish education.

High-school/university course assistant
Entrance exam prep platforms
Turkish software education (HumanEval 84.15)
Academic paper search

Healthcare & Hospital Systems

KVKK + health data sensitivity → full sovereignty mandatory. 30 GB anatomical/medical image training data (patient-consented).

Patient file summary (Turkish anamnesis)
SGK code matching
Clinical research protocol translation
Drug leaflet editing

Defense & Critical Infrastructure

'No non-sovereign option' domain — no foreign hosting, closed-source or unaudited models.

Intelligence report Turkish summary
Logistics & supply analysis
Turkish-interaction training simulation
Open-source code audit

Media & Publishing

Turkish grammar rules, idiom/proverb sensitivity, editorial tone — TR Grammar 79.00 + RLHF Turkish calibration = professional publication quality.

News editing
Turkish subtitle / dub script generation
Publishing text editing (278K context)
Corporate communication Turkish editor

Software & R&D

HumanEval 84.15 / MBPP+ 78.04 → upper-mid frontier code competence. Large codebase analysis with 278K context.

Code review assistant (Turkish explanation)
Documentation generation (TR-EN bilingual)
API spec → client code
Legacy system Turkish comment cleanup

11Known Limitations

The foundation of scientific credibility: no hidden gaps

This whitepaper transparently states V4's weaknesses and limitations alongside its strengths. The V4.1 roadmap identifies these areas as the primary improvement priorities.

GPQA Diamond & MMLU-Pro

GPQA Diamond 0.379 and MMLU-Pro 0.502 are below frontier models (35-50pp). Reason: V4's graduate-level physics, chemistry, biology expert training data is insufficient. V4.1 roadmap plans an academic data sourcing programme with Turkish universities.

Multimodal Capabilities First Release

MMMU 0.533, MathVista 0.341, ChartQA 0.677 — 20-40pp behind frontier vision models. V4.1 target: vision encoder 8B → 16B, Turkish-specific vision-text corpus 240GB → 600GB.

Latency 2-3× Frontier

V4 average 9.55s, p95 32.77s. Frontier models 3-5s average, p95 8-12s. Reason: vision encoder overhead, cross-modal projection, multimodal safety filter.

Multimodal Safety Filter False-Positive

10-15% in V4.0.0; reduced to 2% in V4.0.1 via active calibration.

12Roadmap

V4.1 → V4.2 → V5: concrete improvement targets

V4.1

2026/Q4

Vision encoder 8B → 16B parameters, 24 → 32 layers
Turkish-specific vision-text corpus 240 GB → 600 GB
MMLU-Pro target: 0.50 → 0.65
GPQA Diamond target: 0.38 → 0.55
Latency: avg 9.55s → 4s, p95 32.77s → 15s

V4.2

2027/Q1

Multi-image mode (up to 8 images per request)
Video acceptance (2 FPS frame-sampling for 60s clips)
Speech-to-text integration (sovereign ASR)

V5

2027/Q3

Heterogeneous AI accelerators (GPU + ASIC + FPGA)
Hierarchical MoE (H-MoE)
Continual learning (Elastic Replay Buffer)
Full post-quantum compliance

13Open-Source Strategy

Audit-ready, modular open-sourcing

Training pipeline, HBM/CCW references, vision encoder and cross-modal projection — to be open-sourced step by step. Excluding PII redaction, the full pipeline is open to academic audit from Q3 2026.

Component	License	Release
Training pipeline	Apache-2.0	2026/Q3
HBM/CCW reference	AGPL-3.0	2026/Q4
Vision encoder reference	AGPL-3.0	2027/Q1
Cross-modal projection	AGPL-3.0	2027/Q1
Router-Bus & Adapter API	MPL-2.0	2026/Q4
Benchmark infrastructure	MIT	2026/Q3

14Conclusion

A proof of sovereign science

AIGENCY V4 is the direct successor — with multimodal capability — of the fully sovereign AI family that eCloud Yazilim Teknolojileri started with V3. The evaluation conducted on 27 April 2026 with 13,344 real API calls and reported with Wilson 95% confidence intervals clearly establishes V4's position in the global landscape.

It demonstrates that a sovereign AI model designed for Turkish — globally competitive and fully independent — is technically feasible, runs reliably in production, and can be verified through transparent evaluation.

Whitepaper TR (PDF)Whitepaper EN (PDF)Try the Assistant

Sovereign.Multimodal.128B parameters.

AIGENCY V4 — at a glance

A three-component modular design

Text Core — 120B

Vision Encoder — 8B (V4 New)

Cross-Modal Projection

Hierarchical Memory (HBM)

Optimization Stack Inherited from V3

Adaptive LoRA+

Selective Layer Collapse

Localised MoE (L-MoE)

4-bit Block Quantization

Chunked Attention

278K tokens, 3-tier memory, auditable forgetting

Optional visual stream, two-step API protocol

Multimodal Benchmark Results

1,826 GB Turkish-priority corpus, GPG-signed pipeline

Data Sources (Text)

13,344 real API calls, deterministic conditions

22 Benchmarks — 4 Categories

Side by side with frontier models, 22 benchmarks

Tier 1 — Critical Comparison

No frontier publication — de facto global reference

Stratified subsample (n=1000)

Operational Performance

Same core, multimodal added

Multi-layer encryption, post-quantum readiness

From score profile to 8-sector deployment

Public Sector & Government

Legal & LegalTech

Banking & Finance

Education & Higher Ed

Healthcare & Hospital Systems

Defense & Critical Infrastructure

Media & Publishing

Software & R&D

The foundation of scientific credibility: no hidden gaps

GPQA Diamond & MMLU-Pro

Multimodal Capabilities First Release

Latency 2-3× Frontier

Multimodal Safety Filter False-Positive

V4.1 → V4.2 → V5: concrete improvement targets

V4.1

V4.2

V5

Audit-ready, modular open-sourcing

A proof of sovereign science

Sovereign.
Multimodal.
128B parameters.