Recently, automatic speech and speaker recognition has matured to the degree that it entered the daily lives of thousands
of Europe's citizens, e.g., on their smart phones or in call services. During the next years, speech processing technology
will move to a new level of social awareness to make interaction more intuitive, speech retrieval more efficient, and lend
additional competence to computer-mediated communication and speech-analysis services in the commercial, health, security,
and further sectors. To reach this goal, rich speaker traits and states such as age, height, personality and physical and
mental state as carried by the tone of the voice and the spoken words must be reliably identified by machines.
The iHEARu project aims to push the limits of intelligent systems for computational paralinguistics by
considering Holistic analysis of multiple speaker attributes at once, Evolving and self-learning, deeper Analysis of acoustic
parameters - all on Realistic data on a large scale, ultimately progressing from individual analysis tasks towards universal
speaker characteristics analysis, which can easily learn about and can be adapted to new, previously unexplored characteristics.
From a methodological point of view, today's speaker characteristic recognition mostly relies on standard machine learning techniques
that have been proven successful for various audio recognition tasks including speech and speaker recognition.
However,there still remains a major gap between today's systems and humans analysing speech in a holistic fashion,
learning how speaker states and traits influence each other, and continuously improving their skills from interactions with others.
In the iHEARu project, ground-breaking methodology including novel techniques for multi-task and semi-supervised learning will deliver for
the first time intelligent holistic and evolving analysis in real-life condition of universal speaker characteristics which
have been considered only in isolation so far. Today's sparseness of annotated realistic speech data will be overcome by
large-scale speech and meta-data mining from public sources such as social media, crowd-sourcing for labelling and quality
control, and shared semi-automatic annotation. All stages from pre-processing and feature extraction, to the statistical
modelling will evolve in "life-long learning" according to new data, by utilising feedback, deep, and evolutionary learning
methods. Human-in-the-loop system validation and novel perception studies will analyse the self-organising systems and the
relation of automatic signal processing to human interpretation in a previously unseen variety of speaker classification tasks.
Today’s studies considers speaker characteristics in isolation, i.e., single – or only few - speaker characteristics are considered at once. There is very little exploitation of the interplay and synergies between different characteristics, yet in reality, strong interdependencies between bits of paralinguistic information exist. Still, before this can be exploited on a larger scale, richly annotated data sets will have to be created: at present, databases provide only one or a few speaker characteristics in parallel. The iHEARu project aims to provide the knowledge and technology required for a holistic understanding of all the paralinguistic facets of human speech in tomorrow’s real-life information, communication and entertainment systems.
Self-learning and self-improvement in the iHEARu project will not be limited to iterative data collection. Rather, iHEARu will consider self-optimising feature extraction and self-organising classifiers: The whole process of speaker characteristics learning and analysis shall be self-optimising, as depicted in the flow chart above. For realising these ambitious goals, deep learning combined with neuroevolutionary methods and nonparametric Bayesian learning will play an essential role. This provides promising means for creating self-optimising statistical models and hierarchical input representations with very little amount of supervision.
The iHEARu project approaches the acoustic feature generation and selection issue by trying to understand human reasoning in challenging conditions, from very low SNR, application of voice conversion algorithms, and speech compression, all the way to deliberate faking of voice or speaker states by the subjects. As a consequence, the iHEARu project will not only address environmental (technical) robustness, but more importantly also robustness against fraud.
To automatically obtain robust speech detection and segmentation into meaningful units, the iHEARu project aims to improve all of the pre-processing algorithms including speech separation, noise reduction, voice activity detection, and segmentation in a loop with the subsequent analysis algorithms and the confidence scores given by these (cf. flowchart). Further, dealing with real-life data also means coping with various transmission channels.
The iHEARu project addresses the automatic recognition of speaker attributes and speaking styles that can be clearly identified by humans. However, the iHEARu approach to universal analysis is not to simply define more and more new recognition tasks that are chosen 'ad hoc'; conversely, it is aimed at developing data-driven methods for a framework which is able to automatically identify characteristics of interest by looking at crowd-sourced resources, such as tag collections, opinions in textual comments, or explicitly collected annotations from paid click-workers.
A. Baird, S. H. Jorgensen, E. Parada-Cabaleiro, S. Hantke, N. Cummins, and B. Schuller, “Listener Perception of Vocal Traits in Synthesized Voices: Age, Gender, and Human- Likeness,” Journal of the Audio Engineering Society, Special Issue on Augmented and Participatory Sound and Music Interaction using Semantic Audio, vol. 66, pp. 277-285, April 2018. [link]
E. Coutinho, K. Gentsch, J. van Peer, K. R. Scherer, and B. Schuller, “Evidence of Emotion-Antecedent Appraisal Checks in Electroencephalography and Facial Electromyography,” PLOS ONE, vol. 13, e0189367, 19 pages, January 2018. [link]
N. Cummins, A. Baird, and B. Schuller, “Speech Analysis for Health: Current State-of-the-art and the Increasing Impact of Deep Learning,” Methods, Special Issue on Translational data analytics and health informatics, vol. 151, pp. 41-54, 2018. [link]
J. Deng, Z. Zhang, F. Eyben, and B. Schuller, “Autoencoder-based Unsupervised Domain Adaptation for Speech Emotion Recognition,” IEEE Signal Processing Letters, vol. 21, pp. 1068-1072, 2014. [link]
J. Deng, X. Xu, Z. Zhang, S. Frühholz, and B. Schuller, “Semi-Supervised Autoencoders for Speech Emotion Recognition,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 26, pp. 31-43, January 2018. [link]
J. Deng, X. Xu, Z. Zhang, S. Frühholz, and B. Schuller, “Universum Autoencoder-based Domain Adaptation for Speech Emotion Recognition,” IEEE Signal Processing Letters, vol. 24, pp. 500-504, April 2017. [link]
J. Deng, S. Frühholz, Z. Zhang, and B. Schuller, “Recognizing Emotions from Whispered Speech Based on Acoustic Feature Transfer Learning,” IEEE Access, vol. 5, pp. 5235-5246, March 2017. [link]
J. Deng, X. Xu, Z. Zhang, S. Frühholz, and B. Schuller, “Exploitation of Phase-based Features for Whispered Speech Emotion Recognition,” IEEE Access, vol. 4, pp. 4299-4309, July 2016. [link]
F. Eyben, G. L. Salomo, J. Sundberg, K. Scherer, and B. Schuller, “Emotion in the Singing Voice – a Deeper Look at Acoustic Features in the Light of Automatic Classification,” EURASIP Journal on Audio, Speech, and Music Processing, Special Issue on Scalable Audio-Content Analysis, vol. 19, doi: 10.1186/s13636-015-0057-6, 9 pages, 2015. [link]
M. Freitag, S. Amiriparian, S. Pugachevskiy, N. Cummins, and B. Schuller, “auDeep: Unsupervised Learning of Representations from Audio with Deep Recurrent Neural Networks,” Journal of Machine Learning Research, vol. 18, pp. 1-5, April 2018. [link]
S. Frühholz, E. Marchi, and B. Schuller, “The Effect of Narrow-band Transmission on Recognition of Paralinguistic Information from Human Vocalizations,” IEEE Access, vol. 4, pp. 6059-6072, August 2016. [link]
J. Han, Z. Zhang, N. Cummins, F. Ringeval, and B. Schuller, “Strength Modelling for Real-World Automatic Continuous Affect Recognition from Audiovisual Signals,” Image and Vision Computing, Special Issue on Multimodal Sentiment Analysis and Mining in the Wild, vol. 34, pp. 76-86, 2016. [link]
S. Hantke, A. Abstreiter, N. Cummins, and B. Schuller, “Trustability-based Dynamic Active Learning for Crowdsourced Labelling of Emotional Audio Data,” IEEE Access, vol. 6, pp. 42142-42155, July 2018. [link]
S. Hantke, F. Weninger, R. Kurle, F. Ringeval, A. Batliner, A. Mousa, and B. Schuller, "I Hear You Eat and Speak: Automatic Recognition of Eating Condition and Food Type, Use-Cases, and Impact on ASR Performance," PLOS ONE, vol. 11, e0154486, 24 pages, May 2016. [link]
G. Keren, N. Cummins, and B. Schuller, “Calibrated Prediction Intervals for Neural Network Regressors,” IEEE Access, vol. 6, pp. 54033-54041, 2018. [link]
E. Marchi, F. Vesperini, S. Squartini, and B. Schuller, “Deep Recurrent Neural Network-based Autoencoders for Acoustic Novelty Detection,” Computational Intelligence and Neuroscience, vol. 2017, Article ID 4694860, 14 pages, January 2017. [link]
Mencattini, E. Martinelli, F. Ringeval, B. Schuller, and C. Di Natale, “Continuous Estimation of Emotions in Speech by Dynamic Cooperative Speaker Models,” IEEE Transactions on Affective Computing, vol. 8, pp. 314-327, July–September 2017. [pdf]
F. Mosciano, A. Mencattini, F. Ringeval, B. Schuller, E. Martinelli, and C. Di Natale, “An array of physical sensors and an adaptive regression strategy for emotion recognition in a noisy scenario,” Sensors & Actuators A: Physical, vol. 267, pp. 48-59, November 2017. [link]
E. Parada-Cabaleiro, A. Batliner, A. Baird, and B. Schuller, “The Perception of Multi-modal Emotional Cues by Children in Artificial Background Noise,” Language and Speech, 2019. 12 pages, to appear [link not yet available]
E. Parada-Cabaleiro, G. Costantini, A. Batliner, M. Schmitt, and B. Schuller, “DEMoS – An Italian Emotional Speech Corpus. Elicitation methods, machine learning, and perception,” Language, Resousrces and Evaluation, 2019. 39 pages, to appear [link not yet available]
K. Qian, Z. Zhang, A. Baird, and B. Schuller, “Active Learning for Bird Sounds Classification,” Acta Acustica united with Acustica, vol. 103, pp. 361-364, April 2017. [link]
F. Ringeval, F. Eyben, E. Kroupi, A. Yuce, J.-P. Thiran, T. Ebrahimi, D. Lalanne, and B. Schuller, “Prediction of Asynchronous Dimensional Emotion Ratings from Audiovisual and Physiological Data,” Pattern Recognition Letters, vol. 66, pp. 22-30, November 2015. [link]
H. Sagha, N. Cummins, and B. Schuller, “Stacked Denoising Autoencoders for Sentiment Analysis: a Review” WIREs Data Mining and Knowledge Discovery, vol. 7, doi: 10.1002/widm.1212, nine pages, September/October 2017. [link]
M. Schmitt and B. Schuller, “openXBOW – Introducing the Passau Open-Source Crossmodal Bag-of-Words Toolkit,” Journal of Machine Learning Research, vol. 18, pp. 3370-3374, 2017. [link]
B. Schuller, A. E.-D. Mousa, and V. Vasileios, “Sentiment Analysis and Opinion Mining: On Optimal Parameters and Performances,” WIREs Data Mining and Knowledge Discovery, vol. 5, pp. 255-263, September/October 2015. [link]
B. Schuller, S. Steidl, A. Batliner, E. Noth, A. Vinciarelli, F. Burkhardt, R. van Son, F. Weninger, F. Eyben, T. Bocklet, G. Mohammadi, and B. Weiss, “A Survey on Perceived Speaker Traits: Personality, Likability, Pathology, and the First Challenge,” Computer Speech and Language, Special Issue on Next Generation Computational Paralinguistics, vol. 29, pp. 100-131, January 2015. [link]
B. Schuller, F. Weninger, Y. Zhang, F. Ringeval, A. Batliner, S. Steidl, F. Eyben, E. Marchi, A. Vinciarelli, K. Scherer, M. Chetouani, and M. Mortillaro, “Affective and Behavioural Computing: Lessons Learnt from the First Computational Paralinguistics Challenge,” Computer Speech and Language, vol. 53, pp. 156-180, March 2018. [link]
X. Xu, J. Deng, N. Cummins, Z. Zhang, C. Wu, L. Zhao, and B. Schuller, “A Two-Dimensional Framework of Multiple Kernel Subspace Learning for Recognising Emotion in Speech,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 25, pp. 1436-1449, July 2017. [link]
X. Xu, J. Deng, E. Coutinho, C. Wu, L. Zhao, and B. Schuller, “Connecting Subspace Learning and Extreme Learning Machine in Speech Emotion Recognition,” IEEE Transactions on Multimedia, vol. 20, doi: 10.1109/TMM.2018.2865834, 13 pages, August 2018. [link]
Z. Zhang, N. Cummins, and B. Schuller, “Advanced Data Exploitation in Speech Analysis: An Overview,” IEEE Signal Processing Magazine, vol. 34, pp. 107-129, July 2017. [link]
Z. Zhang, E. Coutinho, J. Deng, and B. Schuller, “Cooperative Learning and its Application to Emotion Recognition from Speech,” IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 23, pp. 115-126, 2015. [link]
Z. Zhang, E. Coutinho, J. Deng, and B. Schuller, “Distributing Recognition in Computational Paralinguistics,” IEEE Transactions on Affective Computing, vol. 5, pp. 406-417, October–December 2014. [link]
S. Amiriparian, M. Freitag, N. Cummins, M. Gerzcuk, S. Pugachevskiy, and B. W. Schuller, “A Fusion of Deep Convolutional Generative Adversarial Networks and Sequence to Sequence Autoencoders for Acoustic Scene Classification,” in Proceedings of the 26th European Signal Processing Conference (EUSIPCO 2018), (Rome, Italy), pp. 977-981, EURASIP, IEEE, September 2018. [link]
S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, M. Freitag, S. Pugachevskiy, and B. Schuller, “Snore Sound Classification Using Image-based Deep Spectrum Features,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3512-3516, ISCA, ISCA, August 2017. [link]
S. Amiriparian, M. Gerczuk, S. Ottl, N. Cummins, S. Pugachevskiy, and B. Schuller, “Bag-of-Deep-Features: Noise-Robust Deep Feature Representations for Audio Analysis,” in Proceedings of the 31st International Joint Conference on Neural Networks (IJCNN 2018), (Rio de Janeiro, Brazil), 7 pages [no pagination], IEEE, IEEE, July 2018. [link]
S. Amiriparian, J. Pohjalainen, E. Marchi, S. Pugachevskiy, and B. Schuller, “Is Deception Emotional? An Emotion-driven Predictive Approach”, in Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association, (San Francisco, CA, USA), pp. 2011-2015, ISCA, ISCA, September 2016. [link]
S. Amiriparian, S. Pugachevskiy, N. Cummins, S. Hantke, J. Pohjalainen, G. Keren, and B. Schuller, “CAST a database: Rapid targeted large-scale big data acquisition via small-world modelling of social media platforms,” in Proceedings of the 7th International Conference on Affective Computing and Intelligent Interaction (ACII 2017), (San Antonio, TX, USA), pp. 340-345, AAAC, IEEE, October 2017. [link]
S. Amiriparian, M. Schmitt, N. Cummins, K. Qian, F. Dong, and B. Schuller, “Deep Unsupervised Representation Learning for Abnormal Heart Sound Classification,” in Proceedings of the 40th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC 2018), (Honolulu, HI, USA), pp. 4776-4779, IEEE, IEEE, July 2018. [link]
Baird, S. Amiriparian, N. Cummins, A. M. Alcorn, A. Batliner, S. Pugachevskiy, M. Freitag, M. Gerczuk, and B. Schuller, “Automatic Classification of Autistic Child Vocalisations: A Novel Database and Results,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 849-853, ISCA, ISCA, August 2017. [link]
A. Baird, S. H. Jorgensen, E. Parada-Cabaleiro, S. Hantke, N. Cummins, and B. Schuller, “Perception of Paralinguistic Traits in Synthesized Voices,” in Proceedings of the 11th Audio Mostly Conference on Interaction with Sound (AM 2017), (London, UK), Article 17, 5 pages, ACM, ACM, August 2017. [link]
A. Baird, E. Parada-Cabaleiro, C. Fraser, S. Hantke, and B. Schuller, “The Perceived Emotion of Isolated Synthetic Audio: The EmoSynth Dataset and Results,” in Proceedings of the 12th Audio Mostly Conference on Interaction with Sound (AM 2018), (Wrexham, UK), Article 7, 8 pages, ACM, ACM, September 2018. [link]
A. Baird, E. Parada-Cabaleiro, S. Hantke, F. Burkhardt, N. Cummins, and B. Schuller, “The Perception and Analysis of the Likeability and Human Likeness of Synthesized Speech,” in Proceedings of INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, (Hyderabad, India), pp. 2863-2867, ISCA, ISCA, September 2018. [link]
A. Batliner and B. Schuller, “More Than Fifty Years of Speech Processing – The Rise of Computational Paralinguistics and Ethical Demands,” in Proceedings of ETHICOMP 2014, (Paris, France), Paper 68, 11 pages, ETHICOMP, CERNA, June 2014. [No link available]
J. Böhm, F. Eyben, M. Schmitt, H. Kosch, and B. Schuller, “Seeking the SuperStar: Automatic Assessment of Perceived Singing Quality,” in Proceedings 30th International Joint Conference on Neural Networks (IJCNN 2017), (Anchorage, AK, USA), pp. 1560-1569, IEEE, IEEE, May 2017. [link]
R. Brückner, M. Schmitt, M. Pantic, and B. Schuller, “Spotting Social Signals in Conversational Speech over IP: A Deep Learning Perspective,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 2371-2375, ISCA, ISCA, August 2017. [link]
E. Coutinho, J. Deng, and B. Schuller, “Transfer Learning Emotion Manifestation Across Music and Speech,” in Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN 2014) as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI 2014), (Beijing, China), pp. 3592-3598, IEEE, IEEE, July 2014. [link]
E. Coutinho, F. Hönig, Y. Zhang, S. Hantke, A. Batliner, E. Nöth, and B. Schuller, “Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets,” in Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016), (Portorož, Slovenia), pp. 1328-1332, ELRA, ELRA, May 2016. [link]
N. Cummins, S. Amiriparian, G. Hagerer, A. Batliner, S. Steidl, and B. Schuller, “An Image-based Deep Spectrum Feature Representation for the Recognition of Emotional Speech,” in Proceedings of the 25th ACM International Conference on Multimedia (MM 2017), (Mountain View, CA, USA), pp. 478-484, ACM, ACM, October 2017. [link]
N. Cummins, S. Amiriparian, S. Ottl, M. Gerczuk, M. Schmitt, and B. Schuller, “Multimodal Bag-of-Words for Cross Domains Sentiment Analysis,” in Proceedings of the 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2018), (Calgary, Canada), pp. 4954-4958, IEEE, IEEE, April 2018. [link]
N. Cummins, S. Hantke, S. Schnieder, J. Krajewski, and B. Schuller, “Classifying the Context and Emotion of Dog Barks: A Comparison of Acoustic Feature Representations,” in Proceedings of the Pre-Conference on Affective Computing 2017 SAS Annual Conference, (Boston, MA), pp. 14-15, SAS, April 2017. [no link available]
N. Cummins, M. Schmitt, S. Amiriparian, J. Krajewski, and B. Schuller, “’You sound ill, take the day off’: Automatic Recognition of Speech Affected by Upper Respiratory Tract Infection,” in Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC 2017), (Jeju Island, South Korea) pp. 3806-3809, IEEE, IEEE, July 2017. [link]
N. Cummins, B. Vlasenko, H. Sagha, and B. Schuller, “Enhancing Speech-based Depression Detection Through Gender Dependent Vowel Level Formant Features,” in Proceedings of the 16th Conference on Artificial Intelligence in Medicine in Europe (AIME 2017), (Vienna, Austria),Lecture Notes in Computer Science (LNCS) series, vol. 10259, pp. 209-214, AIME, Springer, June 2017. [link]
F. Demir, Sengur, A., H. Lu, S. Amiriparian, N. Cummins, and B. Schuller, “Compact Bilinear Deep Features for Environmental Sound Recognition,” in Proceedings of the International Conference on Artificial Intelligence and Data Mining (IDAP 2018), (Malatya, Turkey), 5 pages [no pagination], IEEE, IEEE, September 2018. [link]
J. Deng, N. Cummins, J. Han, X. Xu, Z. Ren, V. Pandit, Z. Zhang, and B. Schuller, “The University of Passau Open Emotion Recognition System for the Multimodal Emotion Challenge,” in Proceedings of the 7th Chinese Conference on Pattern Recognition, Part II (CCPR 2016), (Chengdu, China), Communications in Computer and Information Science book series, vol. 663, pp. 652-666, CCPR, Springer, November 2016. [link]
J. Deng, N. Cummins, M. Schmitt, K. Qian, F. Ringeval, and B. Schuller, “Speech-based Diagnosis of Autism Spectrum Condition by Generative Adversarial Network Representations,” in Proceedings of the 7th International Digital Health Conference (DH 2017), (London, UK), pp. 53-57 ACM, ACM, July 2017. [link]
J. Deng, R. Xia, Z. Zhang, Y. Liu, and B. Schuller, “Introducing Shared-Hidden-Layer Autoencoders for Transfer Learning and their Application in Acoustic Emotion Recognition,” in Proceedings of the 39th IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2014), (Florence, Italy), pp. 4851-4855, IEEE, IEEE, May 2014. [link]
J. Deng, Z. Zhang, and B. Schuller, “Linked Source and Target Domain Subspace Feature Transfer Learning – Exemplified by Speech Emotion Recognition,” in Proceedings of the 22nd International Conference on Pattern Recognition (ICPR 2014), (Stockholm, Sweden), pp. 761-766, IAPR, IAPR, August 2014. [link]
F. Eyben, M. Unfried, G. Hagerer, and B. Schuller, “Automatic Multi-lingual Arousal Detection from Voice Applied to Real Product Testing Applications,” in Proceedings of the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), (New Orleans, LA, USA), pp. 5155-5159, IEEE, IEEE, March 2017. [link]
M. Freitag, S. Amiriparian, N. Cummins, M. Gerczuk, and B. Schuller, “An ‘End-to-Evolution’ Hybrid Approach for Snore Sound Classification,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3507-3511, ISCA, ISCA, August 2017. [link]
T. Geib, M. Schmitt, and B. Schuller, “Automatic Guitar String Detection by String-Inverse Frequency Estimation,” in Proceedings of the GI Workshop “Musik trifft Informatik”, held in conjunction with INFORMATIK 2017, (Chemnitz, Germany), pp. 127-138, GI, GI, September 2017. [link]
J. Guo, K. Qian, H. Xu, C. Janott, B. W. Schuller, and S. Matsuoka, “GPU-Based Fast Signal Processing for Large Amounts of Snore Sound Data,” in Proceedings of the IEEE 5th Global Conference on Consumer Electronics, (GCCE 2016), (Kyoto, Japan), pp. 523-524, IEEE, IEEE, October 2016. [link]
G. Hagerer, N. Cummins, F. Eyben, and B. Schuller, “Robust Laughter Detection for Wearable Wellbeing Sensing,” in Proceedings of the 8th International Digital Health Conference (DH 2018), (Lyon, France), pp. 156-157, ACM, ACM, April 2018. [link]
J. Han, Z. Zhang, F. Ringeval, and B. Schuller, “Reconstruction-error-based Learning for Continuous Emotion Recognition in Speech,” in Proceedings of the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), (New Orleans, LA, USA), pp. 2367-2371, IEEE, IEEE, March 2017. [link]
J. Han, Z. Zhang, F. Ringeval, and B. Schuller, “Prediction-based Learning from Continuous Emotion Recognition in Speech,” in Proceedings of the 42nd IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2017), (New Orleans, LA, USA), pp. 5005-5009, IEEE, IEEE, March 2017. [link]
J. Han, Z. Zhang, M. Schmitt, M. Pantic, and B. Schuller, “From Hard to Soft: Towards more Human-like Emotion Recognition by Modelling the Perception Uncertainty,” in Proceedings of the 25th ACM International Conference on Multimedia, (MM 2017), (Mountain View, CA, USA), pp. 890-897, ACM, ACM, October 2017. [link]
S. Hantke, T. Appel, and B. Schuller, “The Inclusion of Gamification Solutions to Enhance User Enjoyment on Crowdsourcing Platforms,” in Proceedings of the First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia 2018), (Beijing, China), doi: 10.1109/ACIIAsia.2018.8470330, 6 pages, AAAC, IEEE, May 2018. [link]
S. Hantke, N. Cummins, and B. Schuller, “What is my Dog Trying to Tell Me? The Automatic Recognition of the Context and Perceived Emotion of Dog Barks,” in Proceedings of the 43rd IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2018), (Calgary, Canada), pp. 5134-5138, IEEE, IEEE, April 2018. [link]
S. Hantke, E. Marchi, and B. Schuller, “Introducing the Weighted Trustability Evaluator for Crowdsourcing Exemplified by Speaker Likability Classification,” in Proceedings of the 10th Language Resources and Evaluation Conference (LREC 2016), (Portorož, Slovenia), pp. 2156-2161, ELRA, ELRA, May 2016. [link]
S. Hantke, T. Olenyi, C. Hausner, and B. Schuller, “VoiLA: An Online Intelligent Speech Analysis and Collection Platform,” in Proceedings of the First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia 2018), (Beijing, China), doi: 10.1109/ACIIAsia.2018.8470383, 5 pages, AAAC, IEEE, May 2018. [link]
S. Hantke, H. Sagha, N. Cummins, and B. Schuller, “Emotional Speech of Mentally and Physically Disabled Individuals: Introducing the EmotAsS Database and First Findings,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3137-3141, ISCA, ISCA, August 2017. [link]
S. Hantke, M. Schmitt, P. Tzirakis, and B. Schuller, “EAT - The ICMI 2018 Eating Analysis and Tracking Challenge,” in Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI 2018), (Boulder, CO, USA), pp. 559-563, ACM, ACM, October 2018. [link]
S. Hantke, C. Stemp, and B. Schuller, “Annotator Trustability-based Cooperative Learning Solutions for Intelligent Audio Analysis,” in Proceedings of INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, (Hyderabad, India), pp. 3504-3508, ISCA, ISCA, September 2018. [link]
S. Hantke, Z. Zhang, and B. Schuller, “Towards Intelligent Crowdsourcing for Audio Data Annotation: Integrating Active Learning in the Real World,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3951-3955, ISCA, ISCA, August 2017. [link]
G. Keren, J. Deng, J. Pohjalainen, and B. Schuller, “Convolutional Neural Networks and Data Augmentation for Classifying Speakers’ Native Language,” in Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association, (San Francisco, CA, USA), pp. 2393-2397, ISCA, ISCA, September 2016. [link]
G. Keren, T. Kirschstein, E. Marchi, F. Ringeval, and B. Schuller, “End-to-end Learning for Dimensional Emotion Recognition from Physiological Signals,” in Proceedings of the 18th IEEE International Conference on Multimedia and Expo (ICME 2017), (Hong Kong, China), pp. 985-990, IEEE, IEEE, July 2017. [link]
G. Keren, S. Sabato, and B. Schuller, “Tunable Sensitivity to Large Errors in Neural Network Training,” in Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), (San Francisco, CA, USA), pp. 2087-2093, AAAI, February 2017. [link]
G. Keren, S. Sabato, and B. Schuller, “Fast Single-Class Classification and the Principle of Logit Separation,” in Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM 2018), (Singapore, Singapore), pp. 227-236, IEEE, IEEE, November 2018. [link]
G. Keren, and B. Schuller, “Convolutional RNN: an Enhanced Model for Extracting Features from Sequential Data,” in Proceedings of the 2016 International Joint Conference on Neural Networks (IJCNN 2016) as part of the IEEE World Congress on Computational Intelligence (IEEE WCCI 2016), (Vancouver, Canada), pp. 3412-3419, INNS/IEEE, IEEE, July 2016. [link]
E. Marchi, B. Schuller, S. Baron-Cohen, O. Golan, S. Bolte, P. Arora, and R. Häb-Umbach, “Typicality and Emotion in the Voice of Children with Autism Spectrum Condition: Evidence Across Three Languages," in Proceedings of INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, (Dresden, Germany), pp. 115-119, ISCA, ISCA, September 2015. [link]
E. Marchi, F. Vesperini, F. Eyben, S. Squartini, and B. Schuller, “A Novel Approach for Automatic Acoustic Novelty Detection Using a Denoising Autoencoder with Bidirectional LSTM Neural Networks,” in Proceedings of the 40th IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2015), (Brisbane, Australia), pp. 1996-2000, IEEE, IEEE, April 2015. [link]
E. Marchi, F. Vesperini, F. Weninger, F. Eyben, S. Squartini,and B. Schuller, “Non-Linear Prediction with LSTM Recurrent Neural Networks for Acoustic Novelty Detection,” in Proceedings of the 2015 International Joint Conference on Neural Networks (IJCNN 2015), (Killarney, Ireland), 7 pages [no pagination], INNS/IEEE, IEEE, July 2015. [link]
E. Parada-Cabaleiro, A. Baird, A. Batliner, N. Cummins, S. Hantke, and B. Schuller, “The Perception of Emotions in Noisified Nonsense Speech,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3246-3250, ISCA, ISCA, August 2017. [link]
E. Parada-Cabaleiro, A. E. Baird, N. Cummins, and B. Schuller, “Stimulation of Psychological Listener Experiences by Semi-Automatically Composed Electroacoustic Environments,” in Proceedings of the 18th IEEE International Conference on Multimedia and Expo (ICME 2017), (Hong Kong, China), pp. 1051-1056, IEEE, IEEE, July 2017. [link]
E. Parada-Cabaleiro, A. Batliner, A. E. Baird, and B. Schuller, “The SEILS dataset: Symbolically Encoded Scores in Modern-Ancient Notation for Computational Musicology,” in Proceedings of the 18th International Society for Music Information Retrieval Conference (ISMIR 2017), (Suzhou, China), pp. 578-581, ISMIR, ISMIR, October 2017. [link]
E. Parada-Cabaleiro, G. Costantini, A. Batliner, A. Baird, and B. Schuller, “Categorical vs Dimensional Perception of Italian Emotional Speech,” in Proceedings of INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, (Hyderabad, India), pp. 3638-3642, ISCA, ISCA, September 2018. [link]
E. Parada-Cabaleiro, M. Schmitt, A. Batliner, and B. Schuller, “Musical-Linguistic Annotations of Il Lauro Secco,” in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018), (Paris, France), pp. 461-467, ISMIR, ISMIR, September 2018. [link]
E. Parada-Cabaleiro, M. Schmitt, A. Batliner, S. Hantke, G. Costantini, K. Scherer, and B. Schuller, “Identifying Emotions in Opera Singing: Implications of Adverse Acoustic Conditions,” in Proceedings of the 19th International Society for Music Information Retrieval Conference (ISMIR 2018), (Paris, France), pp. 376-382, ISMIR, ISMIR, September 2018. [link]
J. Pohjalainen, F. Ringeval, Z. Zhang, and B. Schuller “Spectral and Cepstral Audio Noise Reduction Techniques in Speech Emotion Recognition,” in Proceedings of ACM Multimedia Conference (MM 2016), (Amsterdam, The Netherlands), pp. 670-674. ACM, ACM, October 2016. [link]
K. Qian, C. Janott, J. Deng, C. Heiser, W. Hohenhorst, N. Cummins, and B. Schuller, “Snore Sound Recognition: On Wavelets and Classifiers from Deep Nets to Kernels,” in Proceedings of the 39th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC 2017), (Jeju Island, South Korea), pp. 3737-3740, IEEE, IEEE, July 2017. [link]
K. Qian, C. Janott, Z. Zhang, C. Heiser and B. Schuller, "Wavelet Features for Classification of Vote Snore Sounds," in Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), (Shanghai, China), pp. 221-225, IEEE, IEEE, March 2016. [link]
K. Qian, Z. Zhang, F. Ringeval, and B. Schuller, “Bird Sounds Classification by Large Scale Acoustic Features and Extreme Learning Machine,” in Proceedings of the 3rd IEEE Global Conference on Signal and Information Processing (GlobalSIP 2015), Machine Learning Applications in Speech Processing Symposium, (Orlando, FL, USA), pp. 1317-1321, IEEE, IEEE, December 2015. [link]
F. Ringeval, S. Amiriparian, F. Eyben, K. Scherer, and B. Schuller, “Emotion Recognition in the Wild: Incorporating Voice and Lip Activity in Multimodal Decision-Level Fusion,” in Proceedings of the ICMI 2014 EmotiW – Emotion Recognition in the Wild Challenge and Workshop (EmotiW 2014), Satellite of the 16th ACM International Conference on Multimodal Interaction (ICMI 2014), pp. 473-480, (Istanbul, Turkey), ACM, ACM, November 2014. [link]
F. Ringeval, E. Marchi, M. Mehu, K. Scherer, and B. Schuller, “Face Reading from Speech - Predicting Facial Action Units from Audio Cues,” in Proceedings of INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, (Dresden, Germany), pp. 1977-1981, ISCA, ISCA, September 2015. [link]
F. Ringeval, E. Marchi, C. Grossard, J. Xavier, M. Chetouani, D. Cohen, and B. Schuller, “Automatic Analysis of Typical and Atypical Encoding of Spontaneous Emotion in the Voice of Children,” in Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association, (San Francisco, CA, USA), pp. 1210-1214, ISCA, ISCA, September 2016. [link]
F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie, and M. Pantic, “Summary for AVEC 2017: Real-life Depression and Affect Challenge and Workshop,” in Proceedings of the 25th ACM International Conference on Multimedia (MM 2017), (Mountain View, CA), pp. 1963-1964, ACM, ACM, October 2017. [link]
F. Ringeval, B. Schuller, M. Valstar, R. Cowie, and M. Pantic, “AVEC 2015 – The 5th International Audio/Visual Emotion Challenge and Workshop,” in Proceedings of the 23rd ACM International Conference on Multimedia (MM 2015), (Brisbane, Australia), pp. 1335-1336, ACM, ACM, October 2015. [link]
B. Sertolli, N. Cummins, A. Sengur, and B. Schuller, “Deep End-to-End Representation Learning for Food Type Recognition from Speech,” in Proceedings of the 20th ACM International Conference on Multimodal Interaction (ICMI 2018), (Boulder, CO, USA), pp. 574-578, ACM, ACM, October 2018. [link]
M. Schmitt, C. Janott, V. Pandit, K. Qian, C. Heiser, W. Hemmert, and B. Schuller, “A Bag-of-Audio-Words Approach for Snore Sounds’ Excitation Localisation,” in Proceedings of the 14th ITG Conference on Speech Communication, (Paderborn, Germany), ITG-Fachbericht, vol. 267, pp. 230-234, ITG/VDE, IEEE/VDE, October 2016. [ITG link] [IEEE link]
M. Schmitt, E. Marchi, F. Ringeval, and B. Schuller, “Towards Cross-lingual Automatic Diagnosis of Autism Spectrum Condition in Children’s Voices,” in Proceedings of the 14th ITG Conference on Speech Communication, (Paderborn, Germany), ITG-Fachbericht, vol. 267, pp. 264-268, ITG/VDE, IEEE/VDE, October 2016. [ITG link] [IEEE link]
M. Schmitt, F. Ringeval, and B. Schuller, “At the Border of Acoustics and Linguistics: Bag-of-Audio-Words for the Recognition of Emotions in Speech,” in Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association, (San Francisco, CA, USA), pp-495-499, ISCA, ISCA, September 2016. [link]
M. Schmitt and B. Schuller, “Deep Recurrent Neural Networks for Emotion Recognition in Speech,” in Proceedings of the 44. Jahrestagung fur Akustik, (DAGA 2018), (Munich, Germany), DEGA, DEGA, March 2018. [link]
M. Schmitt and B. Schuller, “Recognising Guitar Effects - Which Acoustic Features Really Matter?,” in Proceedings of the GI Workshop “Musik trifft Informatik,” held in conjunction with INFORMATIK 2017, (Chemnitz, Germany), pp. 177-190, GI, GI, September 2017. [link]
B. Schuller, "Modelling User Affect and Sentiment in Intelligent User Interfaces [A Tutorial Overview]," in Proceedings of the 20th International Conference on Intelligent User Interfaces (IUI 2015), (Atlanta, GA, USA), pp. 443-446, ACM, ACM, March/April 2015. [link]
B. Schuller, “Speech Analysis in the Big Data Era,” in Text, Speech, and Dialogue – Proceedings of the 18th International Conference on Text, Speech and Dialogue, TSD 2015, satellite event of INTERSPEECH 2015, invited contribution (Pilsen, Czech Republic), Lecture Notes in Computer Science (LNCS) series, vol. 9302, pp. 3-11, TSD, Springer, September 2015. [link]
B. Schuller, “Big Data, Deep Learning – At the Edge of X-Ray Speaker Analysis,” in Proceedings of the 19th International Conference on Speech and Computer (SPECOM 2017), (Hatfield, UK), Lecture Notes in Computer Science (LNCS) series, vol. 10458, pp. 20-34, SPECOM, Springer, September 2017. [link]
B. Schuller, F. Friedmann, and F. Eyben, “The Munich BioVoice Corpus: Effects of Physical Exercising, Heart Rate, and Skin Conductance on Human Speech Production,” in Proceedings of the 9th Language Resources and Evaluation Conference (LREC 2014), (Reykjavik, Iceland), pp. 1506-1510, ELRA, ELRA, May 2014. [link]
B. Schuller, S. Steidl, A. Batliner, S. Hantke, F. Honig J. R. Orozco-Arroyave, E. Noth, Y. Zhang, and F. Weninger, “The INTERSPEECH 2015 Computational Paralinguistics Challenge: Degree of Nativeness, Parkinson’s & Eating Condition,” in Proceedings of INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, (Dresden, Germany), pp. 478-482, ISCA, ISCA, September 2015. [link]
B. Schuller, S. Steidl, A. Batliner, E. Bergelson, J. Krajewski, C. Janott, A. Amatuni, M. Casillas, A. Seidl, M. Soderstrom, A. Warlaumont, G. Hidalgo, S. Schnieder, C. Heiser, W. Hohenhorst, M. Herzog, M. Schmitt, K. Qian, Y. Zhang, G. Trigeorgis, P. Tzirakis, and S. Zafeiriou, “The INTERSPEECH 2017 Computational Paralinguistics Challenge: Addressee, Cold & Snoring,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3442-3446, ISCA, ISCA, August 2017. [link]
B. Schuller, S. Steidl, A. Batliner, J. Epps, F. Eyben, F. Ringeval, E. Marchi, and Y. Zhang, “The INTERSPEECH 2014 Computational Paralinguistics Challenge: Cognitive & Physical Load,” in Proceedings of INTERSPEECH 2014, 15th Annual Conference of the International Speech Communication Association, (Singapore, Singapore), pp. 427-431, ISCA, ISCA, September 2014. [link]
B. Schuller, S. Steidl, A. Batliner, J. Hirschberg, J. K. Burgoon, A. Baird, A. Elkins, Y. Zhang, E. Coutinho, and K. Evanini, “The INTERSPEECH 2016 Computational Paralinguistics Challenge: Deception, Sincerity & Native Language,” in Proceedings of INTERSPEECH 2016, 17th Annual Conference of the International Speech Communication Association, (San Francisco, CA, USA), pp. 2001-2005. ISCA, ISCA, September 2016. [link]
B. Schuller, S. Steidl, A. Batliner, P. B. Marschik, H. Baumeister, F. Dong, S. Hantke, F. Pokorny, E.-M. Rathner, K. D. Bartl-Pokorny, C. Einspieler, D. Zhang, A. Baird, S. Amiriparian, K. Qian, Z. Ren, M. Schmitt, P. Tzirakis, and S. Zafeiriou, “The INTERSPEECH 2018 Computational Paralinguistics Challenge: Atypical & Self-Assessed Affect, Crying & Heart Beats,” in Proceedings of INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, (Hyderabad, India), pp. 122-126, ISCA, ISCA, September 2018. [link]
G. Trigeorgis, F. Ringeval, R. Bruckner, E. Marchi, M. Nicolaou, B. Schuller, and S. Zafeiriou, “Adieu Features? End-to-End Speech Emotion Recognition using a Deep Convolutional Recurrent Network,” in Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), (Shanghai, China), pp. 5200-5204, IEEE, IEEE, March 2016. [link]
M. Valstar, B. Schuller, J. Krajewski, R. Cowie, and M. Pantic, “AVEC 2014: the 4th International Audio/Visual Emotion Challenge and Workshop,” in Proceedings of the 22nd ACM International Conference on Multimedia (MM 2014), (Orlando, FL, USA), pp. 1243-1244, ACM, ACM, November 2014. [link]
B. Vlasenko, H. Sagha, N. Cummins, and B. Schuller, “Implementing Gender-dependent Vowel-level Analysis for Boosting Speech-based Depression Recognition,” in Proceedings of INTERSPEECH 2017, 18th Annual Conference of the International Speech Communication Association, (Stockholm, Sweden), pp. 3266-3270, ISCA, ISCA, August 2017. [link]
F. Weninger, F. Ringeval, E. Marchi, and B. Schuller, “Discriminatively Trained Recurrent Neural Networks for Continuous Dimensional Emotion Recognition from Audio,” in Proceedings of the 25th International Joint Conference on Artificial Intelligence, (IJCAI 2016), (New York City, NY, USA), pp. 2196-2202. IJCAI, IJCAI, July 2016. [link]
X. Xu, J. Deng, M. Gavryukova, Z. Zhang, L. Zhao, and B. Schuller, “Multiscale Kernel Locally Penalised Discriminant Analysis Exemplified by Emotion Recognition in Speech,” in Proceedings of the 18th ACM International Conference on Multimodal Interaction, (ICMI 2016), (Tokyo, Japan), pp. 233-237, ACM, ACM, November 2016. [link]
X. Xu, J. Deng, W. Zheng, L. Zhao, and B. Schuller, “Dimensionality Reduction for Speech Emotion Features by Multiscale Kernels,” in Proceedings of INTERSPEECH 2015, 16th Annual Conference of the International Speech Communication Association, (Dresden, Germany), pp. 1532-1536, ISCA, ISCA, September 2015. [link]
Y. Zhang, E. Coutinho, Z. Zhang, C. Quan, and B. Schuller, “Dynamic Active Learning Based on Agreement and Applied to Recognition of Emotions in Spoken Interactions,” in Proceedings of the 17th International Conference on Multimodal Interaction, (ICMI 2015), (Seattle, WA, USA), pp. 275-278, ACM, ACM, November 2015. [link]
Y. Zhang, E. Coutinho, Z. Zhang, M. Adam, and B. Schuller, 8 “On Rater Reliability and Agreement Based Dynamic Active Learning,” in Proceedings of the 6th International Conference on Affective Computing and Intelligent Interaction (ACII 2015), (Xi’an, China), pp. 70-76, AAAC, IEEE, September 2015. [link]
Y. Zhang, Y. Zhou, J. Shen, and B. Schuller, “Semiautonomous Data Enrichment Based on Cross-task Labelling of Missing Targets for Holistic Speech Analysis,” in Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), (Shanghai, China), pp. 6090-6094. IEEE, IEEE, March 2016. [link]
Z. Zhang, J. Han, K. Qian, and B. Schuller, “Evolving Learning for Analysing Mood-Related Infant Vocalisation,” in Proceedings of INTERSPEECH 2018, 19th Annual Conference of the International Speech Communication Association, (Hyderabad, India), pp. 142-146, ISCA, ISCA, September 2018. [link]
Z. Zhang, F. Ringeval, B. Dong, E. Coutinho, E. Marchi, and B. Schuller, “Enhanced Semi-Supervised Learning for Multimodal Emotion Recognition,” in Proceedings of the 41st IEEE International Conference on Acoustics, Speech, and Signal Processing, (ICASSP 2016), (Shanghai, China), pp. 5185-5189, IEEE, IEEE, March 2016. [link]
Z. Zhang, F. Ringeval, J. Han, J. Deng, E. Marchi, and B. Schuller, “Facing Realism in Spontaneous Emotion Recognition from Speech: Feature Enhancement by Autoencoder with LSTM Neural Networks,” in Proceedings of INTERSPEECH 2016, the 17th Annual Conference of the International Speech Communication Association, (San Francisco, CA, USA), pp. 3593-3597, ISCA, ISCA, September 2016. [link]
Z. Zhang, F. Weninger, M. Wöllmer, J. Han, and B. Schuller, “Towards Intoxicated Speech Recognition,” in Proceedings of the 30th International Joint Conference on Neural Networks (IJCNN 2017), (Anchorage, AK), pp. 1555-1559, IEEE, IEEE, May 2017. [link]
S. Amiriparian, N. Cummins, S. Ottl, M. Gerczuk, and B. Schuller, “Sentiment Analysis Using Image-based Deep Spectrum Features,” in Proceedings of the 2nd International Workshop on Automatic Sentiment Analysis in the Wild (WASA) held in conjunction with the 7th International Conference on Affective Computing and Intelligent Interaction (ACII 2017), (San Antonio, TX), pp. 26-29, AAAC, IEEE, October 2017. [link]
S. Amiriparian, M. Freitag, N. Cummins, and B. Schuller, “Feature Selection in Multimodal Continuous Emotion Prediction,” in Proceedings of the 2nd International Workshop on Automatic Sentiment Analysis in the Wild (WASA 2017) held in conjunction with the 7th International Conference on Affective Computing and Intelligent Interaction (ACII 2017), (San Antonio, TX), pp. 30-37, AAAC, IEEE, October 2017. [link]
S. Amiriparian, M. Freitag, N. Cummins, and B. Schuller, “Sequence to Sequence Autoencoders for Unsupervised Representation Learning from Audio,” in Proceedings of the 2nd Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2017), satellite to EUSIPCO 2017 (T. Virtanen, A. Mesaros, T. Heittola, A. Diment, E. Vincent, E. Benetos, and B. Martinez Elizalde, eds.), (Munich, Germany), pp. 17-21, DCASE, Tampere University of Technology Laboratory of Signal Processing, November 2017. [link]
S. Amiriparian, N. Cummins, M. Freitag, K. Qian, R. Zhao, V. Pandit and B. Schuller, “The Combined Augsburg / Passau / TUM / ICL System for DCASE 2017,” in Technical Reports from the 2nd Detection and Classification of Acoustic Scenes and Events Challenge (DCASE 2017), satellite to EUSIPCO 2017, (Munich, Germany), 1 page [no pagination], DCASE, DCASE, November 2017. [link]
E. Coutinho, F. Weninger, K. Scherer, and B. Schuller, “The Munich LSTM-RNN Approach to the MediaEval 2014 “Emotion in Music” Task,” in Proceedings of the MediaEval 2014 Multimedia Benchmark Workshop (M. Larson, B. Ionescu, X. Anguera, M. Eskevich, P. Korshunov, M. Schedl, M. Soleymani, G. Petkos, R. Sutcliffe, J. Choi, and G. J. Jones, eds.), (Barcelona, Spain), vol. 1263, 2 pages [no pagination], MediaEval, CEUR, October 2014. [link]
J. Deng, X. Xu, Z. Zhang, S. Fruhholz, D. Grandjean, and B. Schuller, “Fisher Kernels on Phase-based Features for Speech Emotion Recognition,” in Proceedings of the Seventh International Workshop on Spoken Dialogue Systems (IWSDS 2016), (Saariselka, Finland), Dialogues with Social Robots (K. Jokinen, and G. Wilcock, eds.), Lecture Notes in Electrical Engineering series, vol. 427, pp. 195-203, IWSDS, Springer, January 2016. [link]
B. Dong, Z. Zhang, and B. Schuller, “Empirical Mode Decomposition: A Data-Enrichment Perspective on Speech Emotion Recognition,” in Proceedings of the 6th International Workshop on Emotion and Sentiment Analysis (ESA 2016), satellite of the 10th Language Resources and Evaluation Conference (LREC) (J. Sanchez-Rada, C. A. Iglesias, B. Schuller, G. Vulcu, P. Buitelaar, and L. Devillers, eds.), (Portorož, Slovenia), pp. 71-75, ELRA, ELRA, May 2016. [link]
S. Hantke, A. Batliner, and B. Schuller, “Ethics for Crowdsourced Corpus Collection, Data Annotation and its Application in the Web-based Game iHEARu-PLAY,” in Proceedings of the 1st International Workshop on ETHics In Corpus Collection, Annotation and Application (ETHI-CA2 2016), satellite of the 10th Language Resources and Evaluation Conference (LREC), (Portorož, Slovenia), pp. 54-59, ELRA, ELRA, May 2016. [link]
S. Hantke, F. Eyben, T. Appel, and B. Schuller, “iHEARu-PLAY: Introducing a game for crowdsourced data collection for affective computing,” in Proceedings of the 1st International Workshop on Automatic Sentiment Analysis in the Wild (WASA 2015) held in conjunction with the 6th biannual Conference on Affective Computing and Intelligent Interaction (ACII 2015), (Xi’an, China), pp. 891–897, AAAC, IEEE, September 2015. [link]
A. Mallol-Ragolta, M. Schmitt, A. Baird, N. Cummins, and B. Schuller, “Performance Analysis of Unimodal and Multimodal Models in Valence-Based Empathy Recognition,” in the OMG-Empathy Prediction Challenge Workshop Proceedings of the 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), (Lille, France), IEEE, IEEE, May 2019. 5 pages, to appear [link not yet available]
E. Marchi, D. Tonelli, X. Xu, F. Ringeval, J. Deng, S. Squartini, and B. Schuller, “Pairwise Decomposition with Deep Neural Networks and Multiscale Kernel Subspace Learning for Acoustic Scene Classification,” in Proceedings of the 1st Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2016), satellite to EUSIPCO 2016 (T. Virtanen, A. Mesaros, T. Heittola, M.D. Plumbley, P. Foster, E. Benetos, and M. Lagrange, eds.), (Budapest, Hungary), pp. 1-5, DCASE, Tampere University of Technology Laboratory of Signal Processing, September 2016. [link]
E. Parada-Cabaleiro, A. Baird, A. Batliner, N. Cummins, S. Hantke, and B. Schuller, “The Perception of Emotion in the Singing Voice: The Understanding of Music Mood for Music Organisation,” in Proceedings of the 4th International Digital Libraries for Musicology workshop (DLfM 2017), satellite event of ISMIR 2017, (Shangai, China), pp. 29-36, ISMIR, ACM, October 2017. [link]
K. Qian, Z. Ren, V. Pandit, Z. Yang, Z. Zhang, B. Schuller. “Wavelets Revisited for the Classification of Acoustic Scenes”, in Proceedings of the 2nd Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2017), satellite to EUSIPCO 2017 (T. Virtanen, A. Mesaros, T. Heittola, A. Diment, E. Vincent, E. Benetos, and B. Martinez Elizalde, eds.), (Munich, Germany), pp. 108-112, DCASE, Tampere University of Technology Laboratory of Signal Processing, November 2017. [link]
Z. Ren, Q. Kong, K. Qian, and B. Schuller, “Attention-based Convolutional Neural Networks for Acoustic Scene Classification,” in Proceedings of the 3rd Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2018) (M.D. Plumbley, C. Kroos, J.P. Bello, G. Richard, D.P.W. Ellis, and A. Mesaros, eds.), (Surrey, UK), pp. 39-43, DCASE, Tampere University of Technology Laboratory of Signal Processing, November 2018. [link]
Z. Ren, V. Pandit, K. Qian, Z. Yang, Z. Zhang, and B. Schuller, “Deep Sequential Image Features for Acoustic Scene Classification,” in Proceedings of the 2nd Detection and Classification of Acoustic Scenes and Events Workshop (DCASE 2017) (T. Virtanen, A. Mesaros, T. Heittola, A. Diment, E. Vincent, E. Benetos, and B. Martinez Elizalde, eds.), (Munich, Germany), pp. 113-117, DCASE, Tampere University of Technology Laboratory of Signal Processing, November 2017. [link]
F. Ringeval, B. Schuller, M. Valstar, R. Cowie, H. Kaya, M. Schmitt, S. Amiriparian, N. Cummins, D. Lalanne, A. Michaud, E. Ciftci, H. Gülec, A. A. Salah, and M. Pantic, “AVEC 2018 Workshop and Challenge: Bipolar Disorder and Cross-Cultural Affect Recognition,” in Proceedings of the 8th International Workshop on Audio/Visual Emotion Challenge, AVEC’18, co-located with the 26th ACM International Conference on Multimedia, (MM 2018), (Seoul, South Korea), pp. 3-13, ACM, ACM, October 2018. [link]
F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie, S. Scherer, S. Mozgai, N. Cummins, M. Schmitt, and M. Pantic, “AVEC 2017 – Real-life Depression, and Affect Recognition Workshop and Challenge,” in Proceedings of the 7th International Workshop on Audio/Visual Emotion Challenge (AVEC 2017), co-located with the 25th ACM International Conference on Multimedia (MM 2017), (Mountain View, CA), pp. 3-9, ACM, ACM, October 2017. [link]
F. Ringeval, B. Schuller, M. Valstar, S. Jaiswal, E. Marchi, D. Lalanne, R. Cowie, and M. Pantic, “AV+EC 2015 - The First Affect Recognition Challenge Bridging Across Audio, Video, and Physiological Data,” in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, AVEC15, co-located with the 23rd ACM International Conference on Multimedia (MM 2015), (Brisbane, Australia), pp. 3–8, ACM, ACM, October 2015. [link]
H. Sagha, E. Coutinho, and B. Schuller, “Exploring the Importance of Individual Differences in the Prediction of Emotions induced by Music,” in Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge, AVEC15, co-located with the 23rd ACM International Conference on Multimedia (MM 2015) (F. Ringeval, B. Schuller, M. Valstar, R. Cowie, and M. Pantic, eds.), (Brisbane, Australia), pp. 57–63, ACM, ACM, October 2015. [link]
B. Schuller, J-G. Ganascia, and L. Devillers, “Multimodal Sentiment Analysis in the Wild: Ethical Considerations on Data Collection, Annotation, and Exploitation,” in Proceedings of the Workshop on Ethics in Corpus Collection, Annotation & Application (ETHI-CA2 2016) (Portorož, Slovenia), pp. 29-34, ELRA, ELRA, May 2016. [link]
B. Schuller and M. McTear, "Sociocognitive Language Processing - Emphasising the Soft Factors," in Proceedings of the Seventh International Workshop on Spoken Dialogue Systems (IWSDS 2016), (Saariselka, Finland), 6 pages [no pagination], January 2016. [no link available]
B. Schuller, Y. Zhang, F. Eyben, and F. Weninger, “Intelligent systems’ Holistic Evolving Analysis of Real-life Universal speaker characteristics,” in Proceedings of the 5th International Workshop on Emotion Social Signals, Sentiment & Linked Open Data (ES3 LOD 2014), satellite of the 9th Language Resources and Evaluation Conference (LREC 2014) (B. Schuller, P. Buitelaar, L. Devillers, C. Pelachaud, T. Declerck, A. Batliner, P. Rosso, and S. Gaines, eds.), (Reykjavik, Iceland), pp. 14–20, ELRA, ELRA, May 2014. [link]
G. Trigeorgis, E. Coutinho, F. Ringeval, E. Marchi, S. Zafeiriou, and B. Schuller, “The ICL-TUM-PASSAU approach for the MediaEval 2015 “Affective Impact of Movies” Task,” in Proceedings of the MediaEval 2015 Multimedia Benchmark Workshop, satellite of INTERSPEECH 2015 (M. Larson, B. Ionescu, M. Sjberg, X. Anguera, J. Poignant, M. Riegler, M. Eskevich, C. Hauff, R. Sutcliffe, G. J. Jones, Y.-H. Yang, M. Soleymani, and S. Papadopoulos, eds.) (Wurzen, Germany), vol. 1436, 3 pages [no pagination], MediaEval, CEUR, September 2015. [link]
M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Torres Torres, S. Scherer, G. Stratou, R. Cowie, and M. Pantic, “AVEC 2016: Depression, Mood, and Emotion Recognition Workshop and Challenge,” in Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge (AVEC 2016), co-located with the 24th ACM International Conference on Multimedia, (MM 2016), (Amsterdam, The Netherlands), pp. 3-10, ACM, ACM, October 2016. [link]
B. Vlasenko, B. Schuller, and A. Wendemuth, “Tendencies Regarding the Effect of Emotional Intensity in Inter Corpus Phoneme-Level Speech Emotion Modelling,” in Proceedings of the 2016 IEEE International Workshop on Machine Learning for Signal Processing (MLSP 2016), (Salerno, Italy), 6 pages [no pagination], IEEE, IEEE, September 2016. [link]
Y. Zhang, E. Coutinho, Z. Zhang, C. Quan, and B.Schuller, “Agreement-based Dynamic Active Learning with Least and Medium Certainty Query Strategy,” in Proceedings of Advances in Active Learning: Bridging Theory and Practice Workshop, held in conjunction with the 32nd International Conference on Machine Learning (ICML 2015) (A. Krishnamurthy, A. Ramdas, N. Balcan, and A. Singh, eds.), (Lille, France), 5 pages [no pagination], IMLS, IMLS, July 2015. [link]
Last Updated: February 2019