Make your own free website on Tripod.com

Laboratory Notebook 13-October-02

 

EVPMaker - some comments

EVP Research

Anyone undertaking research in EVP should have at least some background in Electronics, Speech and Hearing and in Phenomena as a subject (it is, after all, called the Electronics, Voice, Phenomenon) - adding acoustics, phonetics, psycho-acoustics, statistics and the use of relevant instrumentation.

With the advent of CoolEdit and EVPMaker we should add to that list some understanding of information technology, software and mathematics as applied to waveforms, spectra, cepstra and quefrencies, and their manipulation - Fourier and Laplace - all of these being the names of topics applicable to this kind of research.

And, in addition, one should have a good factual knowledge of the history of EVP research - I say "factual" as (almost without exception) anything published purporting to be a "history of EVP" is both partial and far from impartial.

It would be a good idea also to have an understanding of the various beliefs and practices in the spiritual area, particularly the empirical research carried out by such people as Robert Monroe and Emanuel Swedenborg, and Tibetan Lamas.

Useful too would be an understanding of the processes of the mind and such disabilities as schizophrenia and auditory hallucinations.

EVPMaker - this is a work of pure genius by Stefan Bion based on an idea of Dr. Fidelio Koberle. Like every success in a small field it has aroused unwarranted and ill-motivated criticism.

Some notes on experiments with EVPMaker are posted on my website - www.aspsite.tripod.com.

To the best of my knowledge no psycho-acoustic research has been done on the effect of subjecting the ear and listening mechanism to a long series of allophones.

(An allophone is a basic speech segment. Examples of allophones - the 's' sound that begins the word speech - and the 's' sound that ends the word speeches. Strictly speaking - and we have to be accurate in this business - the two 's' sounds are not the same - each is a different allophone. Listen carefully to them and you should notice that they are different).

Unlike some other methods of EVP, with EVPMaker there is no doubt that one is hearing speech sounds - EVPMaker takes speech sounds as spoken normally and then splits the sequence of sounds into a number of small "slices" which are then randomised. For example, the phrase, 'Fish dont sing' may come out as "-ong-t idnosh if". Idnosh is a 5-allophone word which in itself does not mean anything, but someone could interpret it as "Eat-nosh" - (where 'nosh' is Cockney for food) - so now it makes "sense" - and that is the job of the brain - to make sense of things - to connect up the apparently unconnected.

Now lets look at some ballpark figures. If we take a meaningful EVP word as typically consisting of five allophones and there are considered to be a total of 32 allophones in English, including stops - periods of silence - then the total number of combinations of five allophones is 32!/27! = 24,165,120.

And if we take the total vocabulary of common words in English as (a generous) 40, 000, then we would have a (24165120/40000) chance of any EVPMaker 5-allophone (5-al) word actually being a in English.

That works out as a 1 in 604 chance of any 5-al. EVPMaker word being in English. By chance only one word in approximately 600 should be in English. The other words should be nonsense words or words in other languages.

If we run a 25-second EVPMaker session, (and say a 5-al word has a duration of 1 second on average), then we should get an average of 25 such words per session.

And in that case there should only be a 5-allophone word in English for every 600/25 sessions.

That works out to only one English word in 24 sessions. Experience shows that the yield is consistently much higher than that. (See also the definition of 'yield' given in my notes on the experiments posted on my website).

So - there is something happening. There are a lot more English words than one in 24 sessions. There are a lot more English words than there should be.

However, the interpretation of what one hears with EVPMaker is much less well defined as compared with some other EVP methods.

There is a psycho-acoustic concept called "cueing" - which no one else in the world of EVP seems to be aware of, (which is appalling). Cueing is vital in the understanding of speech - lack of cueing apprehension is the cause of some types of deafness - the nature of these forms of deafness and the problems of intelligibility that we face in EVP are closely related. One does not have to dive into pseudo-scientific bull like "resonance" and "vibratory rates" to understand these problems. Unfortunately pseudo-science is often swallowed whole - though the drivers for it are usually personal ambition allied to a laziness or lack of intellectual rigor sufficient to actually study the subject - it is so much easier to invent ideas of "mis-tuning" or "lack of resonance".

Cue-ing is covered expensively in the literature, the learned journal - the Journal of the Acoustic Society of America, (JASA), for example. In the Appendix you will find a list of papers on current research in cue-ing.

What cue-ing means is that what you hear is determined to a large extent on what cues your ear picks-up as to when it should start listening, when an allophone begins and when it ends, when a word begins and when a word ends, when a phrase begins and when it ends.

In sessions longer than say nine seconds this essential process is not possible with EVPMaker, due to the destruction of cues and general cacophony. This tends to lead to subjective interpretations - thus, often leading directly to the subconscious. It could be quite a good audible Roscharch test. This is possibly the source of the audio-nasties reported as sometimes heard with EVPMaker. One does not have to invent pseudo-science bull-terminology to understand that.

The following abstract is of interest to us doing research in EVP. It is from work done at UCL (University College London) by Marta ORTEGA, Valerie HAZAN and Mark HUCKVALE, on the enhancement of intelligibility of speech by cue-enhancement.

{Start of quote}

Abstract

In previous work, 'cue-enhancement' was found to significantly increase the intelligibility of speech in noise. However, the practical application of the technique was limited by the fact that the regions of the speech signal to be enhanced needed to be manually labelled.

The principal aim of this project was therefore to automate the identification and enhancement of 'landmark' regions containing a high density of acoustic cues and to demonstrate improvements in intelligibility at least equal to that obtained for manually-enhanced materials.

We have implemented a technique for automatic cue-enhancement via the automatic identification of potential enhancement regions (PERs), and evaluated intelligibility for automatically-enhanced speech, relative to natural or manually-enhanced speech.

Little loss in intelligibility was seen between the manually-tagged and automatically enhanced materials. However, there was little evidence of statistically-significant improvements as a result of the enhancements. This may have been due in part to the fact that amplification levels across consonantal regions had to be standardised, due to the limitations of the automatic tagging. {End of quote}

OK - now, the research does require some extra knowledge as I have mentioned. For example, here is a quote from inside the paper abstracted above.

Signal processing

2.1 Estimation of potential enhancement regions (PERs)

The process for the automatic identification and location of regions for enhancement was based on a broad-class hidden-Markov model classifier described in Huckvale

(1997). Briefly, this classifier uses a mel-scale cepstral coefficient acoustic vector and six context-free HMMs with three states and five gaussian mixtures. The six models represented silence (SIL), vocalic regions (VOC), fricative regions (FRC), nasal regions (NAS), stop-gaps (GAP) and stop-aspiration (ASP). A bigram phone language model was used with the hard constraint that ASP events could only occur after GAP events. Potential enhancement regions were recovered by rule from the recognised transcription.

Any unfamiliar words are scientific or mathematical terms used in this area of research.

 

To research EVP with EVPMaker -

OK folks - I have to concentrate on Alpha research, on doing the R&D needed to have a new and upgraded unit ready for Faraday Cage trials - already one month behind schedule, on re-writing my paper on EVP for the SPR, and on re-writing and re-formatting my book about the Alpha for publication. So this is as far as I can take it at the moment. Hopefully my comments have been useful - and, if you have not already done so, looking over some of the following reading material may also be useful.

 

Click below to view the next report - which may be rather interesting....

Next Report

Appendix

References

Benoit, C., Grice, M. & Hazan, V. (1996) The SUS test: a method for the assessment of text-to-speech synthesis intelligibility using Semantically Unpredictable Sentences. Speech Communication, 18, 381-392.

Bunnell, H.T. (1990) On enhancement of spectral contrast in speech for hearing-impaired listeners. Journal of the Acoustical Society of America,88, 2546-2556.

Bradlow A.R., Pisoni D.B. (1999) Recognition of spoken words by native and non-native listeners: Talker-, listener-, and item-related factors. Journal of the Acoustical Society of America, 106, 2074-2085.

Colotte, V. and Laprie, Y. (2000) Automatic enhancement of speech intelligibility. In IEEE International Conference on Acoustics, Speech, and Signal Processing -ICASSP'2000, Istanbul

Foster, J.R. and Haggard, M.P. (1979) An efficient analytical test of speech perception. Proc. IoA, IA3, 9-12.

Hazan, V. and Simpson, A. (1998) The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise. Speech Communication, 24, 211-226.

Hazan, V. and Simpson, A. (2000). The effect of cue-enhancement on consonant intelligibility in noise: speaker and listener effects. Language and Speech, 43 (3), 273-295.

Hazan, V., Simpson, A. and Huckvale, M. (1998) Enhancement techniques to improve the intelligibility of consonants in noise: Speaker and listener effects. Proc. ICSLP, Sydney, Australia, December 1998, 5, 2163-2167.

Huckvale, M. (1997) A syntactic pattern recognition method for the automatic location of potential enhancement regions in running speech. Speech, Hearing and language: UCL Work in Progress. http://www.phon.ucl.ac.uk/home/mark/papers/shl97.pdf

Liu, S. 1994. Landmark detection of distinctive feature-based speech recognition.JASA, 96, 5, Part 2, 3227.

Merzenich, M.M., Jenkins, W.M., Johnston, P., Schreiner, C., Miller, S., Tallal, P. (1996) Language Comprehension in Language-Learning Impaired Children Improved with Acoustically Modified Speech. Science, 271.

Ortega, M. and Hazan, V. (1999) Enhancing acoustic cues to aid L2 speech perception. Proc.ICPhS, San Francisco, 1-7 August 1999, 1, 117-120.

Revoile, S.G., Holden-Pitt, L., Pickett, J.M., Brandt, F. (1986) Speech cue enhancement for the hearing impaired: I. Altered vowel durations for perception of final fricative voicing. Journal of Speech and Hearing Research, 29, 240-255.

Robinson, T., Hochberg, M. & and Renals, S. (1996) The use of recurrent neural networks in continuous speech recognition in Automatic Speech and Speaker Recognition - Advanced Topics (Lee, Paliwal and Soong, editors), Kluwer Academic Publishers.

A second set of references is given below - there may be some duplication between the two.

Abberton, E. (1998) Book Review. Vihman, M. M., Phonological Development: the Origins of Language in the Child. Blackwell, Oxford and Cambridge MA, (1996). In Clinical Linguistics and Phonetics, 12,2,149-152.

Abberton, E., Hu, X. & Fourcin, A. (1998) "Real-time speech pattern element displays for interactive therapy". International Journal of Language and Communication Disorders, 33, 292-297.

Abberton, E. & Carlson, E. (1999) "How I use computers in voice therapy" Speech and Language Therapy in Practice, Summer 1999, 26-27.

Abberton, E. (1999) Book Review, Hardcastle, W. & Laver, J. Eds. The Handbook of Phonetic Sciences Oxford: Blackwell. Clinical Linguistics and Phonetics 13,3, 244- 245.

Adlard, A. & Hazan, V. (1998) "Speech perception abilities in children with developmental dyslexia". Quarterly Journal of Experimental Psychology: Section A. vol 51A, 153-177.

Baker, R. J., Rosen, S. & Darling, A. M. (1998) "An efficient characterisation of human auditory filtering across level and frequency that is also physiologically reasonable". In A. R. Palmer, A. Rees, A. Q. Summerfield & R. Meddis (Eds.), Psychophysical and Physiological Advances in Hearing (pp. 81-88). London: Whurr.

Bloothooft, G., van Dommelen, W., Espain, C., Hazan, V., Huckvale, M., & Wigforss, E. (1998). The landscape of future education in speech communication sciences: Proposals. OTS, Utrecht, 148 pp.

Bowerman, C., Eriksson, A., Huckvale, M., Rosner, M., Tatham, M.& Wolters, M., (1999) "Criteria for Evaluating Internet Tutorials in Speech Communication Sciences", Proc. EuroSpeech-99, Budapest, pp 2455-2458.

Byng, S. & Black, M. (1998) "The Reversible Sentence Comprehension Test". In J. Marshall, M. Black and S. Byng (eds) The Sentence Processing Resource Pack. London: Winslow Press.

Chung, H. & Huckvale, M . (1999) "Modeling of Temporal Compression in Korean". In Harvard Studies in Korean Linguistics VIII--Proceedings of the 1999 Harvard International Symposium on Korean Linguistics, 16 July-18 July, Cambridge, U.S.A. Cambridge: Department of Linguistics, Harvard University.

Chung, H., Huckvale, M. & Kim, K. (1999) "A New Korean Speech Synthesis System and Temporal Model". In Proceedings of 16th International Conference on Speech Processing, 18 Aug-20 Aug, Seoul, Korea, vol.1, 203-208.

Chung, H. , Kim, K. & Huckvale, M. (1999) "Consonantal and Prosodic Influences on Korean Vowel Duration". In Proceedings of EuroSpeech-99, 5 Sept-10 Sept, Budapest, Hungary, vol.2, 707-710. Fang, A. C., House, J. & Huckvale, M. (1998) "Investigating the Syntactic Characteristics of English Tone Units". In Proceedings of the International Conference of Spoken Language Processing, 30 Nov-4 Dec, Sydney, Australia.

Faulkner, A. , & Rosen, S (1999) "Contributions of temporal encodings of voicing, voicelessness, fundamental frequency and amplitude variation in audio-visual and auditory speech perception", Journal of the Acoustical Society of America, 106, 2063-2073

Hawkins, S., House, J., Huckvale, M. , Local, J. & Ogden, R. (1998) "ProSynth: An integrated prosodic approach to device-independent, natural-sounding speech synthesis", Proceedings of the International Conference of Spoken Language Processing, 30 Nov-4 Dec, Sydney, Australia.

Hazan, V. & Barrett, S. (1998) "The development of perceptual cue-weighting in children aged 6 to 12". Proceedings of International Conference of Speech and Language Processing, 30 Nov-4 Dec, Sydney, Australia.

Hazan, V., Fourcin, A., Abberton , E. & Wilson, G. (1998) "Speech pattern audiometry for the evaluation of the speech perception abilities of deaf children". Proceedings of the 18th International Congress on Education of the Deaf - 1995. Editor: A. Weisel, Tel Aviv, Israel: Ramot Publications - Tel Aviv University .

Hazan, V. & Simpson, A. (1998) "The effect of cue-enhancement on consonant perception by non-native listeners: preliminary results". Proceedings of StiLL Workshop, Stockholm, May 1998, 119-122. Hazan, V. & Simpson, A. (1998) "The effect of cue-enhancement on the intelligibility of nonsense word and sentence materials presented in noise". Speech Communication, vol. 24, 211-226.

Hazan, V., Simpson, A. & Huckvale, M. (1998) "Enhancement techniques to improve the intelligibility of consonants in noise : Speaker and listener effects". Proceedings of International Conference of Speech and Language Processing, 30 Nov-4 Dec, Sydney, Australia.

Hazan, V. & Barrett, S. (1999) "The development of phoneme categorisation in children aged 6 to 12. Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999, vol. 3, 2493-2496

Hazan, V. & van Dommelen, W. (1999) "Phonetics education in Europe". Proceedings of ESCA/SOCRATES Workshop on Methods and Tools for Speech Science Education (MATISSE), UCL London April 1999, 101-104.

House, J., Dankovicova, J. & Huckvale, M. (1999) "Intonation modelling in ProSynth: an integrated prosodic approach to speech synthesis". Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999, vol. 3, 2343-2346.

Huckvale, M. (1998) "Opportunities for Re-convergence of Engineering and Cognitive Science Accounts of Spoken Word Recognition", Proceedings of Institute of Acoustics Conference on Speech and Hearing, Windermere, November 1998.

Huckvale, M. (1999) "Representation and processing of linguistic structures for an all- prosodic synthesis system using XML", Proc. EuroSpeech-99, Budapest, pp 1847- 1850.

Huckvale, M. , Bowerman, C., Eriksson, A., Pompino-Marschall, B., Rosner, M., Tatham, M., Williams, B. & Wolters, M. (1999) "Computer Aided Learning and use of the Internet", in The Landscape of Future Education in Speech Communication Sciences: 2 Proposals, G. Bloothooft et al. (eds.), Utrecht: OTS Publications.

Markham, D.J. (1998) "The perception of nativeness: Variable speakers and flexible listeners". Proceedings of the 5th International Conference on Spoken Language Processing, Sydney, Australia, December 1998, Vol 6, pp 2651-2654.

Markham, D.J. (1999) "Naive imitation of second-language stimuli: Duration and F0". Proceedings of the 14th International Congress of Phonetic Sciences, San Francisco, USA, August 1999, Vol 2, pp 1145-1148.

Markham, D.J. (1999) "Listeners and disguised voices: the imitation and perception of dialectal accent". Journal of Forensic Linguistics, 6 (2), pp 289-299 Marshall, J, Black, M. & Byng, S. (Eds) (1998) The Sentence Processing Resource Pack, London: Winslow Press.

Marshall, M., Black, M. & Byng, S. (1998) "Working with Sentences: a handbook for Aphasia therapists". In J. Marshall, M. Black and S. Byng (Eds) The Sentence Processing Resource Pack. London: Winslow Press.

McArdle, B., Hazan, V. and Prasher, D. (1999) "A comparison of Speech Pattern audiometry and Manchester Junior Word Lists in hearing impaired children". British Journal of Audiology, 33, 383-393..

Ortega, M. & Hazan, V. (1999) "Enhancing acoustic cues to aid L2 speech perception" Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999, vol. 1, 117-120. Rosen, S. & Howell, P. (1998) Signals and Systems for Speech and Hearing (Japanese language edition) (T. Arai, & T. Sugawara, Translators). Tokyo: Kaibundo.

Rosen, S., Baker, R. J. & Darling, A. M. (1998) "Auditory filter nonlinearity at 2 kHz in normal hearing listeners". Journal of the Acoustical Society of America, 103(5), 2539-2550.

Rosen, S., Faulkner, A. & Wilkinson, L. (1999) "Perceptual adaptation by normal listeners to upward shifts of spectral information in speech and its relevance for users of cochlear implants" Journal of the Acoustical Society of America, 106, 3629-3636 van der Lely, H. K. J.,

Rosen, S. & McClelland, A. (1998) "Evidence for a grammar- specific deficit in children" Current Biology, 8, 1253-1258. van Dommelen, V., Hazan, V. , Aulanko, R., Bryndal, M., Ciobanu, G., Cutugno, F., Fougeron, C., Köster, J.P., Machuca, M., Turk, A. (1999) "Phonetics", In Bloothooft, G. et al. (eds) The Landscape of Future Education in Speech Communication Sciences: 3 Recommendations, OTS, Utrecht.

 

Vance, M., Dry, S., & Rosen, S. (1999) "Auditory processing deficits in a teenager with Landau-Kleffner Syndrome," Neurocase 5,545-554.

Wells, J.C. (1999) "Which pronunciation do you prefer?". IATEFL Issues 149, June - July 1999, The Changing Language, 10-11.

Wells, J.C. (1999) "Pronunciation preferences in British English: a new survey" Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999

Wichmann, A. & House, J. (1999) "Discourse constraints on peak timing in English: experimental evidence". Proceedings of the International Congress of Phonetic Sciences, San Francisco, 1-7 August 1999, vol. 3, 1765-1768.

 

 

 

Alec