Question 1

How much training data do I need for fine-tuning to work?

Accepted Answer

The data requirements depend heavily on the fine-tuning technique and the scope of the adaptation. For instruction fine-tuning with LoRA, meaningful improvement is achievable with as few as 500 to 1,000 high-quality instruction-response examples. For broader domain adaptation or complex style alignment, 5,000 to 50,000 examples produce more robust results. For continued pre-training to acquire domain vocabulary, you need a larger corpus of domain text, typically hundreds of megabytes to several gigabytes. The quality of training data matters far more than quantity: 500 carefully curated, accurate examples will outperform 10,000 noisy ones.

Question 2

Will fine-tuning make the model forget its general capabilities?

Accepted Answer

Catastrophic forgetting, where fine-tuning on a narrow domain degrades general capability, is a real risk in naive fine-tuning approaches. We mitigate this through several mechanisms: including a proportion of general instruction examples in the training mix, using LoRA or QLoRA which modify only a small fraction of the model's parameters, and running regression tests against general benchmarks before and after fine-tuning to confirm no degradation. For most enterprise fine-tuning tasks, the goal is domain enhancement rather than domain replacement, and correctly configured training preserves general capability while improving domain performance.

Question 3

What is the difference between fine-tuning an open-source model vs a commercial API model?

Accepted Answer

Open-source models like Llama 3, Mistral, and Qwen can be fine-tuned with full access to the model weights, deployed on any infrastructure, and the fine-tuned weights are yours entirely. Commercial API models like GPT-4 and Claude offer limited fine-tuning options through their APIs, with restrictions on what can be fine-tuned, training data that passes through their infrastructure, and deployment locked to their API. For Australian organisations with data sovereignty requirements, open-source fine-tuning on sovereign infrastructure is typically the only viable path.

Question 4

How long does a fine-tuning project take?

Accepted Answer

A standard enterprise fine-tuning engagement from data assessment to production deployment takes eight to twelve weeks. The breakdown is: two to three weeks for data assessment and curation (often the most time-consuming phase), one to two weeks for base model selection and baseline evaluation, one to two weeks for fine-tuning runs and hyperparameter search, and two to three weeks for evaluation, refinement, and production deployment. The data curation phase can be accelerated if your organisation already has well-structured instruction-response data, and extended if significant curation work is needed.

Question 5

Do we need to fine-tune again when the model is updated or when new open-source models are released?

Accepted Answer

When a significantly better base model is released, re-fine-tuning on the new base model is typically worthwhile because the improvement in the underlying model capabilities compounds with your domain fine-tuning. The cost of re-fine-tuning on an existing dataset is significantly lower than the initial fine-tuning engagement, since data curation is already complete. We maintain your training datasets and evaluation test sets so re-fine-tuning on a new base model can be completed quickly. For most deployments, we recommend evaluating re-fine-tuning annually or when major new open-source model generations are released.

Question 6

Can fine-tuning embed compliance requirements into the model's responses?

Accepted Answer

Yes, with important caveats. Compliance embedding through fine-tuning means training the model to include required disclosures, avoid non-compliant statements, and follow regulatory guidance in its responses. This works well for consistent compliance patterns, such as always including a required disclaimer or structuring advice according to a prescribed format. It works less well for compliance requirements that are highly context-dependent and require case-by-case judgment. For the latter, RAG retrieval of the relevant compliance guidance combined with a system prompt is more reliable than trying to encode compliance judgment into model weights.

LLM Fine-Tuning Services for Australian Organisations

Why Fine-Tuning Matters for Enterprise AI

Domain Vocabulary and Technical Accuracy

Organisational Style and Compliance

Instruction Following at Your Standard

Fine-Tuning Approaches and Techniques

Supervised Fine-Tuning (SFT)

Parameter-Efficient Fine-Tuning (LoRA and QLoRA)

Reinforcement Learning from Human Feedback (RLHF)

Continued Pre-Training

Model Selection and Baseline Evaluation

Evaluation and Validation

How a Fine-Tuning Engagement Works

Data Assessment and Curation

Base Model Selection and Baseline

Fine-Tuning Runs and Hyperparameter Search

Evaluation, Deployment, and Monitoring

Fine-Tuning for Regulated Australian Industries

Sovereign Training Infrastructure

When Fine-Tuning Delivers the Most Value

Related AI Solutions

RAG Architecture Australia

Private LLM Cost Australia

On-Premises LLM Deployment

Frequently Asked Questions

Adapt an LLM to Your Domain, on Your Infrastructure