TinyML 是机器学习的一个领域,专注于将人工智能的力量带给低功耗设备。该技术对于需要实时处理的应用程序特别有用。在机器学习领域,目前在定位和收集数据集方面存在挑战。然而,使用合成数据可以以一种既具有成本效益又具有适应性的方式训练 ML 模型,从而消除了对大量真实世界数据的需求。
在此项目中,我将向您展示如何通过使用Edge Impulse平台训练模型来创建婴儿哭声检测系统,并将其部署到您的边缘设备(例如Arduino Nicla Voice)。通过使用合成数据训练机器学习模型,我们可以区分婴儿哭声的发生或背景噪音的存在。
这是即将发生的事情的先睹为快:
该图包含部署机器学习模型以检测两种情况所涉及的几个组件和步骤:婴儿哭声和背景噪音,使用 ChatGPT 生成文本提示。
以下是管道图中组件及其交互的逐步分解:
潜在地,机器学习模型的输出可用于触发动作,例如打开灯或向智能手机发送通知。
Arduino Nicla Voice是与Syntiant合作创建的开发板。通过使用 Syntiant 的超低功耗深度学习处理器,该板能够在边缘提供永远在线的语音、手势和动作识别。
凭借其紧凑的尺寸,Nicla Voice 可以集成到可穿戴设备中,允许 AI 集成,同时需要最少的能量消耗。通过使用 Nicla Voice,您可以开发定制的语音识别模型并将它们与开发板一起使用,从而使 Nicla Voice 能够通过分析您的声音来识别特定的单词或短语。
让我们开始吧!
使用ChatGPT生成不同的提示可以简化为我的机器学习模型编写提示的过程,该模型由两类组成:婴儿哭声和背景噪音。通过使用ChatGPT生成不同的提示,我可以节省时间和精力,否则这些时间和精力将花费在集思广益和编写提示上。这种方法还可以产生范围更广的多样化提示,从而可以提高机器学习模型的准确性和有效性。
这是使用 ChatGPT 生成的 Baby crying 场景的我的文本提示。
prompts = [
"Baby Crying",
"Baby crying in bedroom",
"Baby crying loudly",
"Infant crying",
"Newborn crying",
"Crying baby",
"Upset baby",
"Distressed baby",
"Fussy baby",
"Weeping infant",
"Sobbing baby",
"Whimpering baby",
"Wailing baby",
"Bawling baby",
"Crying newborn",
"Tearful baby",
"Bawling infant",
"Mourning baby",
"Bellowing baby",
"Screaming baby",
"Howling baby",
"Squalling baby",
"Yowling baby",
"Crying baby in nursery",
"Wailing infant in bedroom",
"Whimpering baby in crib",
"Sobbing baby in bassinet",
"Crying baby in the dark",
"Upset baby in bed",
"Distressed baby in room",
"Fussy baby in cradle",
"Weeping infant in playpen",
"Sobbing baby in the corner",
"Whimpering baby in the closet",
"Wailing baby in the crib",
"Bawling baby in the nursery",
"Crying newborn in the bedroom",
"Tearful baby in the playroom",
"Bawling infant in the den",
"Mourning baby in the living room",
"Bellowing baby in the kitchen",
"Screaming baby in the bathroom",
"Howling baby in the hallway",
"Squalling baby in the dining room",
"Yowling baby in the family room",
"Crying baby in the middle of the night",
"Wailing infant in the early morning",
"Whimpering baby during naptime",
"Sobbing baby during mealtime",
"Crying baby during bathtime",
"Upset baby during diaper change",
"Distressed baby during playtime",
"Fussy baby during bedtime",
"Weeping infant during storytime",
"Sobbing baby during teething",
"Whimpering baby during vaccination",
"Wailing baby during check-up",
"Bawling baby during colic",
"Crying newborn during feeding",
"Tearful baby during immunization",
"Bawling infant during growth spurt",
"Mourning baby during illness",
"Bellowing baby during teething",
"Screaming baby during reflux",
"Howling baby during ear infection",
"Squalling baby during constipation",
"Yowling baby during sleep regression",
"Crying baby during travel",
"Wailing infant during car ride",
"Whimpering baby during flight",
"Sobbing baby during road trip",
"Crying baby during vacation",
"Upset baby during change of environment",
"Distressed baby during new experiences",
"Fussy baby during unfamiliar situations",
"Weeping infant during loud noises",
"Sobbing baby during separation anxiety",
"Whimpering baby during stranger danger",
"Wailing baby during socialization",
"Bawling baby during weaning",
"Crying newborn during swaddling",
"Tearful baby during bath",
"Bawling infant during burping",
"Mourning baby during pacifier weaning",
"Bellowing baby during crawling",
"Screaming baby during walking",
]
此外,使用像 ChatGPT 这样的语言模型可以帮助我提出我可能想不到的有创意和创新的提示。
这些是背景噪音提示。
prompts = [
"A hammer is hitting a wooden surface",
"A noise of nature",
"The sound of waves crashing on the shore",
"A thunderstorm in the distance",
"Traffic noise on a busy street",
"The hum of an air conditioning unit",
"Birds chirping in the morning",
"The sound of a train passing by",
"A group of people talking in a crowded room",
"The sound of raindrops hitting a tin roof",
"The buzz of a fluorescent light",
"The sound of footsteps on a wooden floor",
"The crackling of a campfire",
"The whirring of a ceiling fan",
"The sound of a basketball bouncing on concrete",
"A dog barking in the distance",
"The rustling of leaves in the wind",
"The buzzing of a bee or other insect",
"The sound of a church bell ringing",
"The roar of a waterfall",
"The tapping of a keyboard",
"The hiss of a steam engine",
"The clanging of pots and pans in a kitchen",
"The sound of a roaring fire in a fireplace",
"The hum of an electric generator",
"The sound of a lawnmower in the distance",
"The whistling of wind through a window crack",
"The clatter of dishes in a busy restaurant",
"The sound of a helicopter flying overhead",
"The tapping of rain on a metal roof",
"The gentle rustling of a book's pages turning",
"The creaking of a wooden chair",
"The sound of a pencil scratching on paper",
"The chirping of crickets at night",
"The crackling of a vinyl record playing",
"The hissing of an old radio",
"The sound of a pencil sharpener grinding",
"The gurgling of a coffee maker",
"The sound of a ticking clock",
"The roar of an airplane engine",
"The bubbling of a fish tank filter",
"The clanking of dishes being washed in a sink",
"The sound of a typewriter clacking",
"The roar of a lion in the wild",
"The whirring of a drone flying overhead",
"The beeping of a car horn in traffic",
"The sound of a door creaking open",
"The buzzing of a mosquito in the room",
"The sound of a blender mixing ingredients",
"The rumbling of a thunderstorm overhead",
"The tapping of a woodpecker on a tree trunk",
"The rustling of paper being shuffled",
"The sound of a busy office with people talking on the phone and typing on their keyboards",
"The sound of a construction site with heavy machinery and drilling",
"The sound of a dishwasher running in the kitchen",
"The chirping of birds in a forest",
"The sound of a police siren in the distance",
"The whistling of wind through tall grass",
"The sound of a cash register in a busy store",
"The buzzing of a fly or bee flying around",
"The sound of a bicycle bell ringing",
"The crackling of a fire in a fireplace"
]
这就是数据集生成的全部内容!
要从文本生成音频文件,下一步涉及使用名为AudioLDM的文本到音频生成工具,该工具由萨里大学和英国伦敦帝国理工学院的研究人员开发。该工具利用潜在扩散模型从文本生成高质量音频。要使用 AudioLDM,您需要一台配备强大 CPU 的独立计算机。虽然建议使用专用 GPU,但这不是强制性的。要测试 AudioLDM 的功能,您可以通过Hugging Face在线试用。
我们将配置我们的 Python 环境。为了管理虚拟环境,我们将使用virtualenv ,它可以像下面这样安装:
sudo pip3 install virtualenv virtualenvwrapper
为了让 virtualenv 工作,我们需要将以下行添加到~/.bashrc文件中:
nano ~/.bashrc
并添加以下行
# virtualenv and virtualenvwrapper
export WORKON_HOME=$HOME/.virtualenvs
export VIRTUALENVWRAPPER_PYTHON=/usr/bin/python3
source /usr/local/bin/virtualenvwrapper.sh
要激活更改,必须执行以下命令:
source ~/.bashrc
现在我们可以使用 mkvirtualenv 命令创建一个虚拟环境。
mkvirtualenv audioldm -p python
使用 pip 安装 PyTorch。
pip3 install torch==2.0.0
然后安装audioldm包。
pip3 install audioldm
然后运行以下命令以使用文本提示生成音频文件,该文件是使用 ChatGPT 生成的,可以在下面的 github 代码部分中找到。
python3 generate.py
您应该得到以下输出:
genereated: A hammer is hitting a wooden surface
genereated: A noise of nature
genereated: The sound of waves crashing on the shore
genereated: A thunderstorm in the distance
genereated: Traffic noise on a busy street
genereated: The hum of an air conditioning unit
genereated: Birds chirping in the morning
genereated: The sound of a train passing by
一旦收集到 wav 音频样本,就可以将它们输入神经网络以启动自动检测婴儿是否在哭泣或是否存在背景噪音的训练过程。
Edge Impulse 是一种基于 Web 的工具,可帮助我们快速轻松地创建可用于各种项目的 AI 模型。我们可以通过几个简单的步骤创建机器学习模型,用户只需一个网络浏览器就可以构建自定义图像分类器。
转到Arduino 云平台,在登录处输入您的凭据(或创建一个帐户),然后开始一个新项目。
下载Google Speech Commands Dataset以从中获取“背景噪声类”数据。可以按如下方式下载数据集。
wget http://download.tensorflow.org/data/speech_commands_v0.02.tar.gz
从Google Speech Commands Dataset上传合成 wav 音频文件和“背景噪音类” 。就我而言,我上传了大约 500 个 wav 文件。如果需要,您还可以通过标记文件并在数据采集中上传并重新训练模型来添加更多文件。
一旦你设置了所有的类并且对你的数据集感到满意,就可以训练模型了。在左侧导航菜单中导航至 Create Impulse。
选择Add a processing block并添加Audio (Syntiant) ,因为它非常适合基于 Syntiant NDP120 的开发板。它会尝试将音频转换成某种基于时间和频率特征的特征,这将有助于我们进行分类。然后选择添加学习块并添加具有两个输出类的分类。
然后导航到 Syntiant。在 Syntiant 下,我们将保留默认参数。单击保存参数。
最后,单击生成功能按钮。您应该会得到如下所示的响应。
按“开始训练”按钮训练模型。此过程可能需要大约 5-10 分钟,具体取决于您的数据集大小。如果一切正常,您应该会在 Edge Impulse 中看到以下内容
我们得到了 90.7% 的验证准确率。你不应该从你的训练数据集中获得 100% 的准确率,因为它可以被认为是过度拟合的模型。任何大于 70% 的值都是出色的模型性能。增加训练时期的数量可能会增加这个准确度分数。
.tflite文件是我们的模型。最终的量化模型文件 (int8) 大小约为5KB ,准确率接近 90%。
查看模型架构及其输入和输出格式和形状总是很有趣。您可以使用像Netron这样的程序来查看神经网络。
单击 serving_default_x:0:我们观察到输入的类型为 int8,大小为 [1, 1600]。现在让我们看看输出:我们有 2 个类,所以我们看到输出形状是 [1, 2]。量化过程会降低模型的性能,因为从 32 位浮点到 8 位整数表示意味着精度损失。
完成模型构建后,请转到“部署”部分并将其部署到其中一个受支持的边缘设备上。ML 模型部署是将经过训练和测试的 ML 模型放入边缘设备等生产环境中的过程,在这里它可以用于其预期目的。
转到 Edge Impulse 的“部署”选项卡。单击您的边缘设备固件类型。在这里,它是 Arduino Nicla 语音。
您可能会看到以下日志消息:
Total Parameter Memory: 1.375 KB out of 640.0 KB on the NDP120_B0 device. | | Estimated Model Energy/Inference at 0.9V: 5.55404 (uJ)
此信息很重要,因为它表明模型的内存效率以及它是否可以部署在 Arduino Nicla Voice 等资源有限的设备上。
我已经获取了训练数据并使用 Edge Impulse 平台在云中训练了一个模型,现在我们正在 Arduino Nicla Voice 上本地运行该模型。因此,可以说它已成功部署到边缘设备。潜在地,可以通过添加触发操作来改进该项目,例如打开灯或向智能手机发送通知。
总之,通过利用 TinyML 的功能并利用通过文本到音频和 ChatGPT 生成的合成数据,可以提高检测和响应婴儿哭声的效率和准确性。证明了人工数据生成的有效性,从而消除了手动数据集搜索的需要。
请随时在下面发表评论。感谢您的阅读!
声明:本文内容及配图由入驻作者撰写或者入驻合作网站授权转载。文章观点仅代表作者本人,不代表电子发烧友网立场。文章及其配图仅供工程师学习之用,如有内容侵权或者其他违规问题,请联系本站处理。 举报投诉
全部0条评论
快来发表一下你的评论吧 !