当前位置：首页 > news >正文

使用Android（Kotlin）+ ML Kit：移动端英文数字验证码识别实战

news 2025/9/15 22:03:05

1 概述与适用场景

在移动端直接对截图或拍照的英文数字验证码做识别，可以用于自动化测试、无障碍辅助或内部工具。使用 Google ML Kit 的 Text Recognition（可离线运行）可以避免服务端延迟。为了提升识别率，我们在前端加入图像预处理（灰度、二值化、去噪和放大）再送给 OCR。

2 环境与依赖

Android Studio Arctic Fox 或更高

Kotlin 1.5+

AndroidX

使用 ML Kit Text Recognition（on-device API）

在 app/build.gradle（module）中添加依赖（版本根据你的 Android Studio / Kotlin 版本微调）：

dependencies {
implementation "androidx.appcompat:appcompat:1.4.0"
implementation "com.google.mlkit:text-recognition:16.0.0" // ML Kit on-device
implementation "com.google.android.material:material:1.4.0"
implementation "androidx.constraintlayout:constraintlayout:2.1.2"
}

（注：若你需要支持中文等，ML Kit 还有其他模型。本文只用默认英文数字识别。）

3 Android 权限与清单

在 AndroidManifest.xml 添加相机权限（若启用拍照）：

并在中保持默认设置。运行时要请求 CAMERA 权限（后面代码会示范）。

4 简单 UI（activity_main.xml）

创建一个极简界面，包含：拍照/选择按钮、ImageView 显示处理后图像、识别按钮、TextView 显示结果。

<androidx.constraintlayout.widget.ConstraintLayout
xmlns:android="http://schemas.android.com/apk/res/android"
xmlns:app="http://schemas.android.com/apk/res-auto"
android:layout_width="match_parent"
android:layout_height="match_parent">

<ImageViewandroid:id="@+id/imageView"android:layout_width="0dp"android:layout_height="0dp"android:contentDescription="captcha"app:layout_constraintTop_toTopOf="parent"app:layout_constraintBottom_toTopOf="@+id/buttonRow"app:layout_constraintLeft_toLeftOf="parent"app:layout_constraintRight_toRightOf="parent"android:scaleType="fitCenter"android:adjustViewBounds="true"android:background="#EEE"/><LinearLayoutandroid:id="@+id/buttonRow"android:layout_width="0dp"android:layout_height="wrap_content"app:layout_constraintBottom_toTopOf="@+id/resultText"app:layout_constraintLeft_toLeftOf="parent"app:layout_constraintRight_toRightOf="parent"android:gravity="center"android:orientation="horizontal"android:padding="8dp"><Buttonandroid:id="@+id/btnSelect"android:layout_width="wrap_content"android:layout_height="wrap_content"android:text="Select" /><Buttonandroid:id="@+id/btnCapture"android:layout_width="wrap_content"android:layout_height="wrap_content"android:text="Capture"android:layout_marginStart="12dp"/><Buttonandroid:id="@+id/btnProcess"android:layout_width="wrap_content"android:layout_height="wrap_content"android:text="Process+OCR"android:layout_marginStart="12dp"/>
</LinearLayout><TextViewandroid:id="@+id/resultText"android:layout_width="0dp"android:layout_height="wrap_content"app:layout_constraintBottom_toBottomOf="parent"app:layout_constraintLeft_toLeftOf="parent"app:layout_constraintRight_toRightOf="parent"android:padding="12dp"android:textSize="18sp"android:textColor="#111"/>

</androidx.constraintlayout.widget.ConstraintLayout>

5 Kotlin 主 Activity（核心逻辑）

下面给出 MainActivity.kt 的完整可运行骨架，包含：图片选择/拍照、处理函数（灰度、二值化、放大、去噪）、调用 ML Kit TextRecognizer、白名单过滤与结果显示。

// MainActivity.kt
package com.example.captchaocr

import android.Manifest
import android.app.Activity
import android.content.Intent
import android.graphics.*
import android.net.Uri
import android.os.Bundle
import android.provider.MediaStore
import android.widget.Button
import android.widget.ImageView
import android.widget.TextView
import androidx.activity.result.contract.ActivityResultContracts
import androidx.appcompat.app.AppCompatActivity
import androidx.core.app.ActivityCompat
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import java.io.IOException
import java.util.regex.Pattern

class MainActivity : AppCompatActivity() {

private lateinit var imageView: ImageView
private lateinit var resultText: TextViewprivate var currentBitmap: Bitmap? = nullprivate val pickImageLauncher =registerForActivityResult(ActivityResultContracts.StartActivityForResult()) { ar ->if (ar.resultCode == Activity.RESULT_OK) {val data = ar.dataval uri = data?.datauri?.let { loadBitmapFromUri(it) }}}private val takePhotoLauncher =registerForActivityResult(ActivityResultContracts.StartActivityForResult()) { ar ->if (ar.resultCode == Activity.RESULT_OK) {val bitmap = ar.data?.extras?.get("data") as? Bitmapbitmap?.let {currentBitmap = itimageView.setImageBitmap(it)}}}override fun onCreate(savedInstanceState: Bundle?) {super.onCreate(savedInstanceState)ActivityCompat.requestPermissions(this, arrayOf(Manifest.permission.CAMERA), 0)setContentView(R.layout.activity_main)imageView = findViewById(R.id.imageView)resultText = findViewById(R.id.resultText)findViewById<Button>(R.id.btnSelect).setOnClickListener {val intent = Intent(Intent.ACTION_PICK, MediaStore.Images.Media.EXTERNAL_CONTENT_URI)pickImageLauncher.launch(intent)}findViewById<Button>(R.id.btnCapture).setOnClickListener {val intent = Intent(MediaStore.ACTION_IMAGE_CAPTURE)takePhotoLauncher.launch(intent)}findViewById<Button>(R.id.btnProcess).setOnClickListener {currentBitmap?.let { bmp ->val processed = preprocessForOCR(bmp)imageView.setImageBitmap(processed)runTextRecognition(processed)} ?: run {resultText.text = "No image loaded"}}
}private fun loadBitmapFromUri(uri: Uri) {try {val bmp = MediaStore.Images.Media.getBitmap(contentResolver, uri)currentBitmap = bmpimageView.setImageBitmap(bmp)} catch (e: IOException) {e.printStackTrace()}
}// ------- 图像预处理函数 -------
private fun preprocessForOCR(src: Bitmap): Bitmap {// 1. 灰度化val gray = toGrayscale(src)// 2. 放大（放大有助于小字体识别）val scaled = Bitmap.createScaledBitmap(gray, gray.width * 2, gray.height * 2, true)// 3. 轻度模糊去噪（可选）val denoised = gaussianBlur(scaled, 1)// 4. 自适应/固定阈值二值化val bin = thresholdOtsu(denoised)// 5. 可选：形态学操作（在 Android 上我们用简单的 dilate/erode 心得实现）val morph = simpleMorphology(bin)return morph
}private fun toGrayscale(src: Bitmap): Bitmap {val w = src.widthval h = src.heightval bmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)val canvas = Canvas(bmp)val paint = Paint()val cm = ColorMatrix()cm.setSaturation(0f)paint.colorFilter = ColorMatrixColorFilter(cm)canvas.drawBitmap(src, 0f, 0f, paint)return bmp
}private fun gaussianBlur(src: Bitmap, radius: Int): Bitmap {// 简单 box blur 代替，性能较好；可用 RenderScript/ScriptIntrinsicBlur（废弃）或第三方库if (radius <= 0) return srcval w = src.widthval h = src.heightval bmp = src.copy(Bitmap.Config.ARGB_8888, true)val pixels = IntArray(w*h)bmp.getPixels(pixels, 0, w, 0, 0, w, h)// 简单均值模糊 kernel size = 3val out = IntArray(w*h)for (y in 1 until h-1) {for (x in 1 until w-1) {var rSum=0; var gSum=0; var bSum=0for (ky in -1..1) {for (kx in -1..1) {val p = pixels[(y+ky)*w + (x+kx)]rSum += (p shr 16) and 0xFFgSum += (p shr 8) and 0xFFbSum += p and 0xFF}}val nr = (rSum/9)val ng = (gSum/9)val nb = (bSum/9)out[y*w+x] = (0xFF shl 24) or (nr shl 16) or (ng shl 8) or nb}}val outBmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)outBmp.setPixels(out, 0, w, 0, 0, w, h)return outBmp
}private fun thresholdOtsu(src: Bitmap): Bitmap {val w = src.widthval h = src.heightval gray = IntArray(w*h)src.getPixels(gray, 0, w, 0, 0, w, h)val hist = IntArray(256)for (p in gray) {val v = (p shr 16) and 0xFF // R channel (灰度后 R=G=B)hist[v]++}val total = w*h// Otsuvar sum = 0for (t in 0..255) sum += t * hist[t]var sumB = 0var wB = 0var wF: Intvar varMax = 0.0var threshold = 0for (t in 0..255) {wB += hist[t]if (wB == 0) continuewF = total - wBif (wF == 0) breaksumB += t * hist[t]val mB = sumB.toDouble() / wBval mF = (sum - sumB).toDouble() / wFval between = wB.toDouble() * wF.toDouble() * (mB - mF) * (mB - mF)if (between > varMax) {varMax = betweenthreshold = t}}// apply thresholdval out = IntArray(w*h)for (i in gray.indices) {val v = (gray[i] shr 16) and 0xFFout[i] = if (v > threshold) Color.WHITE else Color.BLACK}val bmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)bmp.setPixels(out, 0, w, 0, 0, w, h)return bmp
}private fun simpleMorphology(src: Bitmap): Bitmap {// 简单膨胀 + 腐蚀实现，kernel 3x3val w = src.widthval h = src.heightval pixels = IntArray(w*h)src.getPixels(pixels, 0, w, 0, 0, w, h)val tmp = pixels.copyOf()// 膨胀（扩大白色区域）for (y in 1 until h-1) {for (x in 1 until w-1) {var anyWhite = falsefor (ky in -1..1) {for (kx in -1..1) {val v = tmp[(y+ky)*w + (x+kx)]if (v == Color.WHITE) { anyWhite = true; break }}if (anyWhite) break}pixels[y*w + x] = if (anyWhite) Color.WHITE else Color.BLACK}}val bmp = Bitmap.createBitmap(w, h, Bitmap.Config.ARGB_8888)bmp.setPixels(pixels, 0, w, 0, 0, w, h)return bmp
}// ------- ML Kit 调用 -------
private fun runTextRecognition(bitmap: Bitmap) {val image = InputImage.fromBitmap(bitmap, 0)val recognizer = TextRecognition.getClient() // on-device recognizerrecognizer.process(image).addOnSuccessListener { visionText ->val raw = visionText.textval cleaned = filterAlphaNum(raw)resultText.text = "Raw: $raw\nCleaned: $cleaned"}.addOnFailureListener { e ->resultText.text = "Error: ${e.message}"}
}private fun filterAlphaNum(s: String): String {// 只保留大小写字母和数字，且移除空格与换行val pattern = Pattern.compile("[^A-Za-z0-9]")return pattern.matcher(s).replaceAll("").trim()
}

}

说明：

这段代码在 btnProcess 被点击时，完成预处理并调用 ML Kit 做识别。

toGrayscale 使用 ColorMatrix 做灰度化（效率好）。

thresholdOtsu 实现 Otsu 自适应阈值用于二值化。

runTextRecognition 使用 ML Kit 的 on-device API；识别完成后用 filterAlphaNum 做白名单过滤。

为简洁起见，图像处理函数没有做极致性能优化。实际 App 可把耗时操作放在后台线程（例如使用 Coroutine 或 ExecutorService）。

6 流程说明与测试

启动 App，点击 Select 选择相册中的验证码图片，或点击 Capture 拍照。

点击 Process+OCR，App 会先进行预处理（灰度、放大、去噪、二值化、形态学），把处理后的图像显示在 ImageView，然后调用 ML Kit 识别，并显示 Raw 和 Cleaned（只保留字母数字）的结果。

如果对识别不满意，可调整 thresholdOtsu 的后处理、放大倍数或形态学参数，再测试效果。

7 提升识别率的实战技巧